I've have approximately 90GB of uncompressed .csv data to import into SQL. depending on the ultimate requirements of the interface, I may or may not be able to partition this data - but whether I am able to or not, these are going to be some big tables. Though large in total, the rows are relatively simple:
[Exchange] [Symbol] [Compare_To_Exchange] [Compare_To_Symbol] [Date] [Correlation]
NYSE ZTS INDEX DCXE 12/9/2013 0:00 -0.5521076
It is important to remember you are making a database here and not a CSV file. Compression makes a lot of sense in a CSV file but not in a well designed data model. Compression makes sense on unstructured data, but you have structured data.
If you normalize the data you don't need to compress at all. In your example you are showing 4 or 5 words of data plus a time stamp plus what is probably a 32 bit numeric. So less than 160 bits per row.
To be more clear ->
[Exchange] -> 16 bit index to look up in exchange table. [Symbol] -> 16 bit index to look up in symbol table [Compare_To_Exchange] -> 16 bit index to look up in compare table [Compare_To_Symbol] -> 16 bit index to look up in compare table [Date] -> 64 bit timestamp [Correlation] -> 32 bit numeric
That is going to be much smaller than the CSV
You might even be able to make some of these 8 bit indexes if lookup tables are small. If that is a date and not a timestamp that is much smaller.
Or if you are lazy you could use 32 bit integers or even 64 -- still the rows are going to be small. If you use the structure of the data in your design.