K. Mao K. Mao - 1 month ago 8
Python Question

Labeling duplicates in a list

Say I have a list of names in python, such as the following:

names = ['Alice','Bob','Carl','Dave','Bob','Earl','Carl','Frank','Carl']


Now, I want to get rid of the fact that there are duplicate names in this list, but I don't want to remove them. Instead, for each name that appears more than once in this list, I want to append a suffix to that name, where the suffix is the n-th time the name has appeared, while preserving the order of the list. Since there are 3 Carls in the list, I want to be able to refer to them as Carl_1, Carl_2, and Carl_3 respectively. So in this case the desired output is as follows:

names = ['Alice','Bob_1','Carl_1','Dave','Bob_2','Earl','Carl_2','Frank','Carl_3']


I can do this by looping through the list and modifying each name if it needs to be modified, for example with something like the following code.

def mark_duplicates(name_list):
output = []
duplicates = {}
for name in name_list:
if name_list.count(name) = 1:
output.append(name)
else:
if name in duplicates:
duplicates['name'] += 1
else:
duplicates['name'] = 1
output.append(name + "_" + str(duplicates['name']))
return output


However this is a lot of work and a lot of lines of code for something that I suspect shouldn't be very hard to do. Is there a simpler way to accomplish what I want to do? For example, using something such as list comprehension or a package like itertools or something?

Answer

collections.Counter can help cut down on the bookkeeping a bit:

In [106]: out = []

In [107]: fullcount = Counter(names)

In [108]: nc = Counter()

In [109]: for n in names:
     ...:     nc[n] += 1
     ...:     out.append(n if fullcount[n] == 1 else '{}_{}'.format(n, nc[n]))
     ...:

In [110]: out
Out[110]:
['Alice', 'Bob_1', 'Carl_1', 'Dave', 'Bob_2', 'Earl', 'Carl_2', 'Frank', 'Carl_3']
Comments