teepu teepu - 5 months ago 14
Linux Question

Remove extra commas from only 2nd and 3rd row of CSV file

I have a comma delimited file (CSV file)

test.csv
as shown below.

FHEAD,1,2,3,,,,,,
FDEP,2,3,,,,,,,,
FCLS,3,,,4-5,,,,,,,
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,


I wanted to remove the empty columns only from 2nd and 3rd row of the file i.e. were ever the records starts with
FDEP
and
FCLS
only in those rows I wanted to remove the empty columns (,,).

after removing the empty columns the same file
test.csv
should look like

FHEAD,1,2,3,,,,,,
FDEP,2,3
FCLS,3,4-5
FDETL,4,5,6,7,8,
FTAIL,5,67,,,,,,


How can I do this in Unix???

Answer

Here's one way to do it, using sed:

sed '/^F\(DEP\|CLS\),/ { s/,\{2,\}/,/g; s/,$// }'

We use a range of /^F\(DEP\|CLS\),/, i.e. the following command will only process lines matching ^F\(DEP\|CLS\),. This regex matches beginning-of-string, followed by F, followed by either DEP or CLS, followed by ,. In other words, we look for lines starting with FDEP, or FCLS,.

Having found such a line, we first substitute (s command) all runs (g flag, match as many times as possible) of 2 or more (\{2,\}) commas (,) in a row by a single ,. This squeezes ,,, down to a single ,.

Second, we substitute , at end-of-string by nothing. This gets rid of any trailing comma.