Tiago Bruno Tiago Bruno - 1 year ago 32
Linux Question

How can I use awk to find consecutive patterns in lines?

I'am trying to create an awk script that is capable of count the number consecutive patterns based on the 3th field and that the first and last coordinate field (2th field) was printed as in the example.

I have a script that can count the number of patterns in any coordinate window I want, for example: 1000000 centering the data at the middle:

awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000]++} END{for(k in a) print k,a[k]}' file


However it is counting the number of all patterns regardless of being 1/1 or 0/1.

17 38172452 1/1
17 38172942 1/1
17 38172973 1/1
17 38173143 0/1
17 38176256 0/1
17 38176476 1/1
17 38178149 0/1
17 38178627 0/1
17 38179275 0/1
17 38179290 0/1
17 38179492 0/1
17 38179667 1/1
17 38182229 0/1
17 38183090 0/1
17 38183505 0/1
17 38188419 0/1
17 38188844 0/1
17 38189049 0/1


Expected result:

17 38172452 38172973 3 1/1
17 38173143 38176256 2 0/1
17 38178149 38179492 5 0/1
17 38182229 38189049 6 0/1


Can you guys help me out with this?

Answer Source

assuming $1 is not changing...

awk '{if(p==$3) {c++; e=$2}
      else {if(c>1) print $1,b,e,$3,c; 
            b=$2; c=1; p=$3}}
 END {print $1,b,$2,$3,c}' file
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download