Lisann Lisann - 3 months ago 17
Linux Question

bash: grep exact matches based on the first column

I have a .txt file like below:

9342432_A1 9342432 1 0 0 0
4392483_A2 4392483 2 0 0 0
4324321_A3 4324321 1 0 0 0
9342432 9342432 2 0 0 0


For example, I want to generate a subset with the IDs 4324321_A3 and 9342432 (based on the first column!).
I tried the following command to find the exact matches:

grep -E '4324321_A3|9342432'


But when I use this line, I end up with a dataset like this:

9342432_A1 9342432 1 0 0 0
4324321_A3 4324321 1 0 0 0
9342432 9342432 2 0 0 0


The problem is that the line that matches a part of the ID (9342432_A1) shouldn't be there.
Can anyone help me with this?

I would like to end up with this:

4324321_A3 4324321 1 0 0 0
9342432 9342432 2 0 0 0

Answer

It matches

9342432_A1 9342432 1 0 0 0

because it has 9342432 in the second column.

You need to update the command to make grep check lines starting with those words, that is, use ^word:

$ grep -E '^4324321_A3|^9342432' file
4324321_A3 4324321 1 0 0 0
9342432    9342432 2 0 0 0

To make it more accurate, you can also use -w that matches the full word. This way grep -wE '^4324321_A3|^9342432' file would not match a line like

4324321_A3something 4324321 1 0 0 0
Comments