Justin Justin - 5 months ago 8
Bash Question

Counting the number of lines that have the same entry in the first column in bash

I have a data file that looks like the following:

123456, 1623326
123456, 2346525
123457, 2435466
123458, 2564252
123456, 2435145


The first column is the "ID" -- a string variable. The second column does not matter to me. I want to end up with

123456, 3
123457, 1
123458, 1


where the second column now counts how many entries there are in the original file that correspond with the unique "ID" in the first column.

Any solution in bash or perl would be fantastic. Even STATA would be good, but I figure this is harder to do in STATA... Please let me know if anything is unclear. Thanks!

Answer
cut -d',' -f1 in.txt | sort | uniq -c | awk '{print $2 ", " $1}'

gives:

123456, 3
123457, 1
123458, 1