user1479571 user1479571 - 2 months ago 12
Linux Question

how to replace delimiter when delimiter also present in field data at any place in a file?

I have content of input file in linux, something like

"test1",2,2,,0,"Sun,day",Jan
"be,st3",1,0,,0,"Sunday",Feb1
"nest",0,0,,0,"Sunday",Jul
"rest,5",,,,0,"Sunday",Aug


Need Output like

test1|2|2||0|Sun,day|Jan
be,st3|1|0||0|Sunday|Feb1
nest|0|0||0|Sunday|Jul
rest,5||||0|Sunday|Aug


used tr command to replace the , with | it replaces the fieldvalue as well. I am not able to understand how to only change the delimiter not the values. can someone please provide some pointers how and which commands i can use to perform this task ?

Answer

That's a difficult task to achieve with sed. I'm not saying it's impossible but difficult.

In python (version 3.x), on the other hand, done with a few lines:

import csv

with open("input.csv") as fr:
    with open("output.csv","w",newline='') as fw:  # uncomment for python 3.x
    with open("output.csv","wb") as fw:            # python 2.x only
        cr = csv.reader(fr,delimiter=",")
        cw = csv.writer(fw,delimiter="|")
        cw.writerows(cr)

How it works: It just uses the amazing built-in csv module. read with a separator, write with another.

okay, now just for fun ... my sed solution if you really want to know

create a sedfile like this:

s/"\([^",]\+\)"/\1/g
s/"\([^"]\+\),\([^"]\+\)"/\1%\2/g
s/,/\|/g
s/%/,/g

apply it sed -f sedfile.txt input.csv > output.csv

how it works:

  • strips the quotes from the simple fields
  • strips the quotes from the fields containing commas, but replace commas by percent signs
  • change the separator
  • change back percent signs by comas

result:

test1|2|2||0|Sun,day|Jan
be,st3|1|0||0|Sunday|Feb1
nest|0|0||0|Sunday|Jul
rest,5||||0|Sunday|Aug

it's limited to 1 coma per protected field (can be extended to 3 or more...) and fields must not use the % sign.