con con - 4 years ago 150
Bash Question

uniq -c unable to count unique lines

I am trying to count unique occurrences of numbers in the 3rd column of a text file, a very simple command:

awk 'BEGIN {FS = "\t"}; {print $3}' bisulfite_seq_set0_v_set1.tsv | uniq -c

which should say something like

1 10103
2 2093
3 109

but instead puts out nonsense, where the same number is counted multiple times, like

20 1
1 2
1 1
1 2
14 1
1 2

I've also tried

awk 'BEGIN {FS = "\t"}; {print $3}' bisulfite_seq_set0_v_set1.tsv | sed -e 's/ //g' -e 's/\t//g' | uniq -c

I've tried every combination I can think of from the uniq man page. How can I correctly count the unique occurrences of numbers with uniq?

Answer Source

uniq -c counts the contiguous repeats. To count them all you need to sort it first. However, with awk you don't need to.

$ awk ... '{count[$3]++} END{for(c in count) print count[c], c}' file 

will do

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download