Marco Falzone Marco Falzone - 4 months ago 10
Linux Question

How can I replace a comma character in a CSV ONLY when included between " "

I have a csv file which contains both words and amounts. When the amount is > 999 the number is enclosed within " " in order to differentiate the comma character used as thousand separator by the comma used as field separator, like this:

black, "1,340.00", brown, white, 150.00, blue
apple, 10.00, bread, coffee, "1,850.00", juice
cat, dog, 995.00, tiger, "2,450.00"


I wish to remove the comma ONLY where it's enclosed between " ", leaving the other comma (field separators), and also remove the " ".
The output of the new csv should be like this:

black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00


I played around with sed and awk but I'm not sure about the best way to achieve it.
Thank you!

Answer
$ awk -F\" '{for (i=2; i<=NF; i+=2) gsub(/,/,"",$i)} 1' OFS="" input.csv
black, 1340.00, brown, white, 150.00, blue
apple, 10.00, bread, coffee, 1850.00, juice
cat, dog, 995.00, tiger, 2450.00

How it works

  • -F\"

    This tells awk to use double-quotes as the field separator.

  • for (i=2; i<=NF; i=i+2) gsub(/,/,"",$i)

    Every even field is a field in double-quotes. For those even fields, we remove commas.

  • 1

    This is awk's cryptic shorthand for print-the-line.

  • OFS=""

    This tells awk to use an empty string as the field separator on output. This has the effect of removing the quotes.