Deepak K M Deepak K M - 22 days ago 8
Bash Question

How to make awk ignore the field delimiter inside double quotes?

I need to delete 2 columns in a comma seperated values file.
Consider the following line in the csv file:

"abc@xyz.com,www.example.com",field2,field3,field4
"def@xyz.com",field2,field3,field4


Now, the result I want at the end:

"abc@xyz.com,www.example.com",field4
"def@xyz.com",field4


I used the following command:

awk 'BEGIN{FS=OFS=","}{print $1,$4}'


But the embedded comma which is inside quotes is creating a problem, Following is the result I am getting:

"abc@xyz.com,field3
"def@xyz.com",field4


Now my question is how do I make awk ignore the "," which are inside the double quotes?

Answer Source

From the GNU awk manual (http://www.gnu.org/software/gawk/manual/gawk.html#Splitting-By-Content):

$ awk -vFPAT='([^,]*)|("[^"]+")' -vOFS=, '{print $1,$4}' file
"abc@xyz.com,www.example.com",field4
"def@xyz.com",field4

and see What's the most robust way to efficiently parse CSV using awk? for more generally parsing CSVs that include newlines, etc. within fields.