realspirituals realspirituals - 6 months ago 19
Bash Question

enclose a string where missing double quotes

I have an input file like below. The issue is that the file is pipe delimited and enclosed by double quotes, optionally. It is missed in the third field at the end of the string and I could see that it happens whenever the length exceeds say 2.

"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2|10301 # 3rd field -> closing " missed out


The output should look like

"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301


I was trying with some awk commands but could not achieve it.

awk -F'|' -v q=\" '{$3=$3 q;}1' OFS=| temp
awk -F'|' -v q=\" '{if (length($3) > 2) ($3=$3;}1)}' OFS='|' temp

Answer

Using awk you can write,

awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'

Example

awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'  input
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301

  • What it does?

  • -F'"?\\|' Sets the input field separator to either "| or |

  • -vOFS='"|' Sets the output filed separator to "|. This is set always, that is even if the input field separator is | or "|


Or you can also write

awk -F'"?\\|' -vOFS='"|'  '1' input

Here 1 is always evaluated to true, in which case it will print the entire line.