Dhruuv Dhruuv - 5 months ago 35
Linux Question

Remove double quotes and comma from a numeric value of a .CSV file

I have a .CSV file which has few records with numbers in them which are enclosed in double quotes (such as in "455,365.44") and commas in between the quotes. I need to remove the comma from the numeric values("455,365.44" should look like 455365.44 after processing) of the records so I could use them in the further processing of the file.

Here is a example of the file

column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,"455,365.44","string with quotes, and with a comma in between","4,432",6787,890,88
432,"222,267.87","another, string with quotes, and with two comma in between","1,890",88,12,455
11,"4,324,653.22","simple string",77,777,333,22


and I need the result look like:

column 1, column 2, column 3, column 4, column 5, column 6, column 7
12,455365.44,"string with quotes, and with a comma in between",4432,6787,890,88
432,222267.87,"another, string with quotes, and with two comma in between",1890,88,12,455
11,4324653.22,"simple string",77,777,333,22


P.S: I need only the values which are numeric to be converted like this and the string values should remain same.

Please help...

Answer

To remove the quotes (replace the number with the quotes with the number without them):

s/"(\d[\d.,]*)"/\1/g

See on rubular

For the commas I could only think of a lookahead and lookbehind, if thats supported by your regex implementation (replace commas with nothing if before and after is a number within quotes):

s/(?<="[\d,]+),(?=[\d,.]+")//g

You would have to execute this before removing the quotes.

It might also work without lookbehind:

s/,(?=[\d,.]*\d")//g

See on rubular

In a shell script you might want use perl e.g. execute:

cat test.csv | perl -p -e 's/,(?=[\d,.]*\d")//g and s/"(\d[\d,.]*)"/\1/g'

Explanation of the regex:

first execute:

s/,(?=[\d,.]*\d")//g 

This will remove all commas that are followed by a number ([\d,.]*\d) and a quote, thus removing only commas from numbers within quotes

next execute

s/"(\d[\d,.]*)"/\1/g

This will replace all numbers that are within quotes by the value without the quotes

Comments