Mrd05d - 1 year ago
Linux Question

Split Large CSV into multiple files by column

I have a CSV that is 4.5GB in size. I would like to separate this CSV into different files based on columns. For example:

File1.csv: Cols(1,35,36,37)
File2.csv: Cols(1,127,129,135)
File3.csv: Cols(1,285,287,299,311)
File4.csv: Cols(1,2,4,5,6,12,13,14)

** note column 1 is an id column and is needed throughout.

Essentially I want to break up this CSV that contains 328 columns into what will be many smaller CSV's for Mysql import.

While I could easily do this with multiple awk commands I do not want to re-process an entire 4.5GB-6GB file with each command. Any suggestions?

Answer Source

You can redirect output of print to different files within awk:

awk '{print $1, $35, $36, $37 > "file1.csv"} \
     {print $1, $127, $129, $135 > "file2.csv"} \
     {print $1, $285, $287, $299, $311 > "file3.csv"}\
     {print $1, $2, $4, $5, $6, $12, $13, $14 > "file4.csv"}' inFile.csv
