Vikas Saxena Vikas Saxena - 8 months ago 59
Linux Question

removing header and blank lines from a csv file

I have a bunch of csv files coming in the form of a daily feed from some other system.

I have to remove the header and some optionally present blank lines from the files before loading it onto HDFS and building an external table on top of it.

Currently I have a two step process which works to remove the header and blank space before putting the file on HDFS

//remove blank lines
sed -i '/^\s*$/d' file_20160802.csv

//remove header
sed -i 1d file_20160802.csv

//put file on HDFS
hdfs dfs -put file_20160802.csv /raw/abc/20160802/

Is there a way I can combine the two steps without creating any temporary files?

sat sat

You can combine like this :

sed -i '1d; /^\s*$/d' file