meleu meleu - 6 months ago 9
Linux Question

How to put sequential numbers at the end of repeated data in a line?

I have a file with some repeated information. The lines are numbered, followed by a colon, followed by the information. I want to put a sequential number only at the end of the repeated information.

Example.

Input:

1:Jose da Silva
2:Jose da Silva
3:Fulano de Tal
4:Jose da Silva
5:Sicrano Pereira
6:Ze Ruela
7:Sicrano Pereira
8:Jose da Silva


Output:

1:Jose da Silva #1
2:Jose da Silva #2
3:Fulano de Tal
4:Jose da Silva #3
5:Sicrano Pereira #1
6:Ze Ruela
7:Sicrano Pereira #2
8:Jose da Silva #4


[This question differs from this one because here the lines are allways different (every line has a different number). My input/output examples may look very similar, but in the real application they are not.]

Answer

Tweaking my previous answer:

awk -F: 'FNR==NR {count[$2]++; next}
         count[$2]>1 {$0=$0 OFS "#"++times[$2]}
         1' file file

That is: the first time, count how many times each second block occurs. The second time, keep appending an incrementing number to those that appear more than once. So instead of comparing the whole line, it compares the second field, which is everything from the colon :.

$ awk -F: 'FNR==NR {count[$2]++; next} count[$2]>1 {$0=$0 OFS "#"++times[$2]}1' file file
1:Jose da Silva #1
2:Jose da Silva #2
3:Fulano de Tal
4:Jose da Silva #3
5:Sicrano Pereira #1
6:Ze Ruela
7:Sicrano Pereira #2
8:Jose da Silva #4
Comments