user2340612 user2340612 - 5 months ago 19
Bash Question

Awk/sed replace newlines

Intro:



I have been given a CSV file in which the field delimiter is the pipe characted (i.e.,
|
).
This file has a pre-defined number of fields (say
N
). I can discover the value of
N
by reading the header of the CSV file, which we can assume to be correct.

Problem:



Some of the fields contain a newline character by mistake, which makes the line appear shorter than required (i.e., it has
M
fields, with
M < N
).

What I need to create is a
sh
script (not
bash
) to fix those lines.

Attempted solution:



I tried creating the following script to try fixing the file:

if [ $# -ne 1 ]
then
echo "Usage: $0 <filename>"
exit
fi

# get first line
first_line=$(head -n 1 $1)

# get number of fields
num_separators=$(echo "$first_line" | tr -d -c '|' | awk '{print length}')

cat $1 | awk -v numFields=$(( num_separators + 1 )) -F '|' '
{
totRecords = NF/numFields
# loop over lines
for (record=0; record < totRecords; record++) {
output = ""
# loop over fields
for (i=0; i<numFields; i++) {
j = (numFields*record)+i+1
# replace newline with question mark
sub("\n", "?", $j)
output = output (i > 0 ? "|" : "") $j
}
print output
}
}
'


However, the newline character is still present.
How can I fix that problem?

Example of the CSV:



FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a
newline
Foo|Bar|Baz


Expected output:



FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a * newline
Foo|Bar|Baz

* I don't care about the replacement, it could be a space, a question mark, whatever except a newline or a pipe (which would create a new field)

Answer
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==1 { reqdNF = NF; printf "%s", $0; next }
{ printf "%s%s", (NF < reqdNF ? " " : ORS), $0 }
END { print "" }

$ awk -f tst.awk file.csv
FIRST_NAME|LAST_NAME|NOTES
John|Smith|This is a field with a newline
Foo|Bar|Baz

If that's not what you want then edit your question to provide more truly representative sample input and associated output.