Max von Hippel Max von Hippel - 2 months ago 8
Bash Question

Print, modify, print again Bash variable

I am looping over a CSV file. Each line of the file is formatted something like this (it's Open Street Maps data):


planet_85.287_27.665_51a5fb91,AcDbEntity:AcDbPolyline,{ [name] Purano
Bus Park-Thimi [type] route [route] microbus [ref] 10 } { [Id] 13.0
[Srid] 3857 [FieldsTableId]


This follows the format:


Layer,SubClasses,ExtendedEntity,Linetype,EntityHandle,Text


I want to add a new column for
Name
. I can find the name in a line by cutting off everything before [name] and after [. This code successfully creates a new-line delineated file of all of the names (which I open as a CSV and then copy-paste into the original file as a new column).

cat /path/to/myfile.csv | while read line
do
if [[ ${line} == *"name"* ]]
then
printf "$(echo $line | LC_ALL=C sed 's/^.*name\]//g'| LC_ALL=C cut -f1 -d'[') \n"
else
printf "\n"
fi
done >/path/to/newrow.csv


This system is clearly suboptimal - I would far prefer to print the entire final row. But when I replace that printf line with this:

printf "$line,$(echo $line | LC_ALL=C sed 's/^.*name\]//g'| LC_ALL=C cut -f1 -d'[') \n"


It prints the line but not the name. I've tried printing them in separate print statements, printing the line and then echoing the name, saving the name in a variable and then printing, and a number of other techniques, and each time I either a) only print the line, or b) print the name on a new line, which breaks the CSV format.

What am I doing wrong? How can I print the full original line with the name appended as a new column at the end?

NOTE: I am running this in Terminal on macOS Sierra on a MacBook Pro 15" Retina.

Answer

If I understand correctly, you want to extract the name between [name] and [type], and append as the new last CSV column. You can do that using capture groups:

sed -e 's/.*\[name\] \(.*\) \[type\].*/&,\1/' < input

Notice the \(.*\) in the middle. That captures the text between [name] and [type].

In the replacement string, & stands for the matched string, which is the entire line, as the pattern starts and ends with .*. Next the , is a literal comma, and \1 stands for the content of the first capture group, the part matched within \(...\).

Comments