AishwaryaKulkarni AishwaryaKulkarni - 2 months ago 5
Bash Question

Remove the first lines till the occurence of a regular expression in a column

I have some lines that I get in order using following

grep ENSG00000006114 File | sort -V
chr17 35874900 35879174 ABCD0000006114:I25 -
chr17 35874901 35879174 ABCD0000006114:I25 -
chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -


However I want to delete the first rows that contain ':I' in the first row until I get ':E' for that I have been trying something like

grep ENSG00000006114 File | sort -V | awk '{if ($4 ~ /:I/ && NR==1) next};1'


However there might be more than one occurrences as in the above case for the first few rows, so how do I exclude rows containing :I until the first :E occurs in first row such that my final outcome would be:

chr17 35875548 35875671 ABCD0000006114:E27 -
chr17 35875672 35877289 ABCD0000006114:I26 -
chr17 35877290 35877445 ABCD0000006114:E26 -
chr17 35877446 35877932 ABCD0000006114:I25 -

Answer

You can use this awk:

grep ENSG00000006114 File | sort -V |
awk 'p==1 && $4 ~ /:E/{p=2} !p && $4 ~ /:I/{p=1} p==1{next} 1'

chr17   35875548    35875671    ABCD0000006114:E27  -
chr17   35875672    35877289    ABCD0000006114:I26  -
chr17   35877290    35877445    ABCD0000006114:E26  -
chr17   35877446    35877932    ABCD0000006114:I25  -
  • When p==0 && $4 matches :I then we set p=1
  • While p==1 we skip that record and move to next
  • When p==1 && $4 matches :E then we set p=2 thus allowing remaining records to print.
Comments