Moving announcement

Swiss Army Spreadsheets has moved!

It is now hosted at http://www.swissarmyspreadsheets.com. This site (http://swissarmyspreadsheets.wordpress.com) will no longer be updated.

The choose() function

Recently I mentioned the choose() statement.

The syntax is =choose(IndexNumber,option1, option2, option3, … , optionLast) Open Office and Google Docs can have up to 30 options, and Excel 2010 can have up to 254. You’re not required to have the maximum number of options, of course! You could in theory have just =choose(IndexNumber,option1), though there wouldn’t be much point.

It can only pick based on numbers, not letters, or words, or which of a set of cells is greatest, or anything like that. However, as I demonstrated in the earlier post, there are a lot of ways to convert other things to numbers.

If your IndexNumber is higher than the number of options you’ve given — say, for example, if you had =choose(3,option1,option2) — then you’ll get an error message. In Google Docs and Microsoft Excel the error is #VALUE!, and in Open Office Calc it’s Err:502.

As with IF(), CHOOSE() also has help wizards in Excel and Open Office, but not in Google Docs.

If() help wizard

If you’re having trouble reading an if() function as it’s written in the cell, or if you’re just having trouble remembering which part is which, you can always use the wizard to help.

For example, the formula I used to only show each year once in a list of dates was =if(month(Date)=1,concatenate(” “,year(Date)),””). However, it might be easier to read as this:

That’s the help wizard screen from Microsoft Excel. As you can see it separates the parts, tells you what should go in each part, and shows you each part’s results.

To get to it, go to the “Formulas” tab and click on “Insert Function”. Then type IF in the search box and click Go. After that, select IF() from the search results, and click OK.

Open Office has something similar:

To get to it, click on Insert and then Function…, or type ctrl-F2. After that there are two ways to get to the wizard. The first and simplest is to type =if() in the Formula box — that will bring the wizard up immediately, as soon as it identifies which function you want. The other option is to select the Logical category, then either select the IF() function and click Next>>, or double-click the IF() function.

Unfortunately, Google Docs doesn’t have this helpful option.

Using if() to made varied concatenate() formulas easier

In the process I talked about last week, there were a lot of different styles that the cells could have, and there were a few different situations the cells could be in. In the example I posted, I manually put the different types of codes where they needed to be. However, I could have used if() to make my life easier in a couple of ways. Continue reading

Simple statistics: counting blank results

When this posts I’ll be on my way to a week long conference, and while I knew this conference was coming I always forget how much time it takes beforehand to prepare, and afterwards to catch up. So while I’m gone I’ll be posting a few things that I started some time back, but which somehow never got posted.

The first of these is a post I wrote back in June 2012 when I was talking about counting blanks and counting entries.

As I’ve noted previously, sometimes a cell will have a formula, but that formula will return a blank result. The simplest of these is =””. Its result is everything that’s between the quotation marks, which is to say: nothing.

Sometimes it can be very useful to find out how many of these there are in a given range. If you were using an if() statement to put things into categories you might want to make sure that everything gets put into a category. For example, suppose you were a teacher of a number of different classes, with different numbers of students, and had set up the following set of rules.
- All grades > 90 are A
- All grades < 90 and >80 are B
- All grades < 80 and >70 are C
- All grades < 70 and >60 are D
- All grades < 60 are F
- Everything else is blank

It looks like a reasonable set of rules, but any grade which is exactly 90 wouldn’t match either of the first two rules, so it would be blank. That’s fine if you meant it to be blank, but not so good if you didn’t. So what you want to know is not how many cells are blank, but how many are not empty but have blank results. If the number is zero, you’re good, but if the number is greater than zero you need to check your rules.

If you only had one class, you could just set the range to be the same number of rows as you have students. With multiple classes, though, if you want to re-use the formula then it’s best to set the range to the largest possible number of students — or simply to the entire column. That means you have to filter out the number of blank lines, in order to find the actual blank results.

In Excel, you can use the fact that blank results are counted by both countblank() and counta() to figure out how many cells are in both categories. =counta(A:A)+countblank(A:A)-rows(A:A) would return the number of formulas in column A returning blank results, because the cells containing empty results would be counted twice – once as blank, and once as non-blank. If you knew there were 30 students you could also just use =countblank(A2:A31), but the more general formula can be anywhere, and it will still return the correct number no matter how many students there are, or whether there’s a column header, or anything else. The general formula will, however, give an inaccurate result if the formula for assigning letter grades extends further than the list of student names.

Google Docs Spreadsheet also treats blank results as blanks, so if you knew there were 30 students and a column header you could use =countblank(A2:A31). But if you wanted a broader version that would work without knowing the number of students you couldn’t use the same formula as in Excel. Google Docs Spreadsheet treats cells as either blank or non-blank, but never both, so you can’t just count the overlap. However, a broader version is still possible. To create one you need a different column that’s guaranteed to have a non-blank entry in every row, to create a fake overlap. In this example it can be pretty much assumed that every student will have a name, so you could do something like: =counta(Names)-counta(Grades)
Essentially this is just finding the blanks by process of elimination. Count the names, and count the grades, and the difference between them is the number of names that don’t have grades, or, in other words, the number of names that have blanks instead of grades.

In Open Office Calc, however, there’s no way to count blank results using a combination of counta() and countblank(), but you can do something much simpler. You can use =countif(A:A;””). I talked about the countif() function here. Although this is a much simpler formula I didn’t suggest it for the others because it won’t work for them. In both Google Docs and Excel it will count all of the empty cells as well as the blank results, same as countblank().

Using concatenate() to write HTML code (presenting data online)

A couple of weeks ago I talked about the options for preparing data for presentation using the tools available in most spreadsheets, but that’s great for using in a word processor or a slide presentation, but we live in an online world. You can save a chart or graph as a picture and post it online, but if you try to do that with a table it will be much less helpful than actually having the numbers online where they can be copied.

There are a couple of ways to easily post the numbers online, and which one is best depends on whether you expect to be updating it regularly, and the accessibility of scripting languages such as PHP. They’re both pretty involved, though, so I’ll talk about one this week, and the other one in another post. Continue reading

Using concatenate() and if() to improve charts and graphs

If you look at the example I discussed last week, specifically the Present_Pie and Present_Line worksheets, you might notice that when making pie charts and the single-value line graph, I had three columns for the source data.

This is significant, because in most cases you’ll only have two columns: one for the axis/category labels, and one for the data itself. After all, that’s all the input the chart or graph will accept.

The reason I have three is that I’m using the third column, which I inserted between the other two columns, to create much more useful label by using concatenate(), and a more readable one by using if().

The basic label’s always static, and contains one piece of information. In the examples I was working with it was either the periods of time (i.e. one per month), or the name of the library. It could also be student names and assignments, if you were recording student scores. It could be baseball players and game dates, if you were tracking how many home runs they scored. It really could be anything, but they come from the row and column headers of your source data, and they’re known before you even start filling the table with data.

What I did was use concatenate to combine the label and the data into a single column. For example, in the total usage pie chart I concatenated together the name of the library and its percentage of the total usage. This means that in the legend, the list of libraries to the right of the pie chart, it says things like “Library 1: 6%” instead of just “Library 1″. Although this adds no new information to the pie chart, it gives the viewer two different ways to see the same information.

If the scale had been important, then I could have used the base value (1773 for Library 1) instead of the percentage in either the in-chart data labels or in the legend.

One note about arrangement and formatting – since I was using fake library names they were all conveniently similar in length, so the percentages after them were more or less in line with each other. However, if I were to be using names of vastly different length like, oh, Peace College and St. Andrews Presbyterian College, any numbers after the college names would be very difficult to scan through. There are two basic solutions to this problem, using the functions I’ve already introduced.

Solution 1: Abbreviate the names to similar lengths. You could, for example, cut “St. Andrews Presbyterian College” down to “St. Andrews”. You could, in fact, use =left() to just take the leftmost 15 or so characters from each label. If you did so you’d want to arrange them in alphabetical order and find any duplicates. If there were some duplicates you’d want to figure out abbreviations that would still convey the proper identification, and manually change them. For example, the University of North Carolina system has a lot of schools named “University of North Carolina at <<insert city name here>>” Anyone who’s reading a chart about North Carolina university libraries should be able to figure out what “UNC Greensboro” and “UNC Wilmington” are.

Solution 2: Put the numbers first. This is simpler by far, but I prefer to put the names first when it’s going in a legend, because English speakers read from left to right, and the library’s identity is something you want to know before you get into the statistic(s) about it.

In the second pie chart, I did away with the legend altogether. The library was identified in the title of the chart, and I used data labels to show the number of chats and the date. In this one I chose to put the date second because you already have a vague idea of the date based on how far around the chart it is. The first slice is January, the second one is February, and so on. It’s not perfect, but it’s enough to go on. Also, because the data is around a circle, rather than in a nice neat column, it’s very easy for our eye to pick which part of the label to focus on. Lastly, it means that the year, which is a number, is distinctly separated from the number of chats.

You may notice, however, that the year only occurs on the January entry – all the others just have the number of chats and the month. That’s because you don’t really need it on the others, and it would just end up cluttering things and making it hard to read. I did this little trick by using an if() statement: if the month was January then include the year, which translates to =if(month(Date)=1,concatenate(” “,year(Date)),””). I have the extra concatenate() in there to add a space between the month and the date – otherwise I’d end up with Jan2012.

In the third pie chart I went back to using a legend, so that I could show the exact percentages for each month in one location, and the actual values in a different one. The chart itself gives us a good idea of the percentages — the first four months have the majority of the traffic, then it abruptly drops in the fifth month, then slowly increases again. It also tells us what scale we’re looking at…does that first chunk represent about 30 chats, or 300, or 3000? The legend lists the exact percentages and the months in a clean, easy to read fashion, making it easier to see which slice corresponds to which month and also giving an alternative representation for alternative learning styles.

On the line chart for total usage I again used the if() trick to get the year to show up only in January. In this case I put the year before the month, so that it would be the part sticking out from the rest of the line. Have you ever seen a dictionary that had indentations in the side of the pages marking the start of each letter? It makes it much easier to go straight to that section, since you only have to parse a single letter rather than the whole words that appear at the top corners of the pages. This is similar, making the year stick out in isolation so it’s easy to scan for that, and then look deeper to find the right month.