aceminer aceminer - 2 months ago 9
Bash Question

How to get column index of field in unix shell

i have a csv file with headers.

a,b,c,d,e,f,g,h

I would like to do something

cat abc.csv | sed "something to split them" | grep "e"

#position of "e"


Can someone guide me how do i get the column idx of which header 'e' is at.

Answer

Assuming your goal is to say "which column is this value in", you have a number of options, but this works:

sed -n $'1s/,/\\\n/gp' abc.csv | grep -n e
#output: 5:e

If you want to get just the number out of that:

sed -n $'1s/,/\\\n/gp' abc.csv | grep -n e | cut -d: -f1
#output: 5

Explanation:

Since the headers are on the first line of the file, we use the -n option to tell sed not to print out all the lines by default. We then give it an expression that starts with 1, meaning it is only executed on the first line, and ends with p, meaning that line gets printed out afterward.

The expression uses ANSI quotes ($'...') simply so it's easier to read: you can put a newline in it with \n instead of having to include a literal newline. Regardless, by the time the shell is done with it, the expression $'1s/,/\\\\n/gp' gets passed to sed as 1s/,/\ /gp, which tells it to replace every comma on the first line with a newline and then print out the result. The output of just the sed on your example would be this:

a
b
c
d
e
f
g
h

We then pipe that output through a grep command looking for e, with the -n option that tells grep to include the line number of matching lines in its output. In the example, it outputs 5:e, where the 5: says that the rest of the output is from the 5th line of the input.

We can then pipe that through cut with a field delimiter (-d) of : to extract just the first field (-f1), which is the line number in the sed output - which is the field number in the original file.