abbie abbie - 6 months ago 9
Linux Question

Remove ":" from field two of CSV only and ignore other fields

I've been trying to clean up the data in a csv file which contain data similar to this:


8979880, Number One : Exclusive Mix, 387387, http://www.smashhits.com
4844404, Top 40 : 1988, 3893938, http://www.best80s.com
48094940, Highlander:The Return, 489494, http://www.instantaccess.com


My goal is to replace the colon in field 2 with a space. Initially I used sed to replace the : with a spacelike so:


sed i "s/:/ /g" file.csv


This works in removing the colon but unfortunately this also removes the colon in the url which is not what I want. How can I specify that I only want the command to affect the data in field 2?

Answer

You can use gnu sed like this:

sed -r 's/^([^,]*,[^,]*):/\1 /g' file.csv

Explanation

  • ^ anchors the expression at the start of each line
  • now [^,]*, matches the first field including the separator
  • and then [^,]*: matches from the second field to the :
  • the parenthises ^(...): take care that everything up to but not including the : in the second field is captured into \1
  • finally the replacement with \1 (there is a space after the \1 does the replacement of the : with space on line where the regex matched