I have a bunch of csv files coming in the form of a daily feed from some other system.
I have to remove the header and some optionally present blank lines from the files before loading it onto HDFS and building an external table on top of it.
Currently I have a two step process which works to remove the header and blank space before putting the file on HDFS
//remove blank lines
sed -i '/^\s*$/d' file_20160802.csv
sed -i 1d file_20160802.csv
//put file on HDFS
hdfs dfs -put file_20160802.csv /raw/abc/20160802/
You can combine like this :
sed -i '1d; /^\s*$/d' file