When is a spreadsheet not a spreadsheet?

I know that sounds like the opening to a “frayed knot” joke, but I couldn’t come up with a suitable punchline. However, it’s still a valid question that I think it’s time to address.

There are two types of data formats that aren’t exactly spreadsheets, but naturally and logically belong in the same category: Machine-readable formats, and human readable formats.

The first type, machine-readable formats, can be read by a human, but they can be somewhat difficult to interpret. For example, in comma delimited format each cell value is separated by a comma, and each row is separated by a new line. Which leads to files like this example:

Name,Test1,Test2,Test3,Test4,Test5,Test6,Test7,Test8,Test9,Test10,Test11
Alex,853,294,129,958,502,846,815,346,223,32,779
Bob,84,66,546,878,995,139,195,21,689,288,338
Catherine,682,932,540,274,178,801,435,557,433,36,261

Quick, what was Catherine’s score on Test7? You can figure it out by counting the numbers, but it’s not easy to see at a glance. Fortunately if you imported this into a spreadsheet, the program would automatically straighten it out for you.

Comma delimited formats are usually saved in text files with a .CSV extension instead of a .TXT extension. .CSV stands for Comma Separated Values.

Another common machine-readable format is tab delimited. It’s the same principle as the comma delimited format, but with tabs instead of commas. They can have either a .TSV (Tab Separated Values) or .txt extension. The tricky thing about trying to read a tab delimited file without putting it in a spreadsheet is that it can look like it’s in tidy columns when it’s actually not. See how, in the table below, Alex and Bob’s 3rd scores line up neatly with the header label for Test2, and with Catherine’s 2nd score?

Name Test1     Test2     Test3     Test4     Test5     Test6     Test7     Test8     Test9
Alex 853  294  129  958  158  486  12   585  101
Bob  84   66   546  878  109  339  291  544  517
Catherine 682  932  540  274  206  498  96   897  997

The second type, human-readable formats, are designed to be read easily by humans. This might be because they were created before spreadsheets and word processors that handled tables were even an option, or for a number of other reasons. I can really only think of one type of format in this category, and that’s the “fixed column width” format. In this format every column has a specific number of characters. Usually that number is one or two more than the longest entry in that column. For example:

Name       Test1  Test2  Test3  Test4  Test5  Test6  Test7  Test8  Test9
Alex        853    294    129    958    649    367    338    843    488
Bob          84     66    546    878    517    917    878    388    925
Catherine   682    932    540    274    227    497    581    987      4

I’ll talk more about how to turn these formats into spreadsheets, and turn spreadsheets into these formats, later.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s