bob_cobb bob_cobb - 4 months ago 8
Python Question

Remove duplicate items by value from a dictionary

I've got the following dictionary:

potential_duplicates = {
432L: (u'one two three', u'one two three'),
433L: (u'one two three', u'one two three'),
434L: (u'whole foods', u'whole foods'),
435L: (u'whole foods', u'whole foods'),
437L: (u'this is a dupe', u'this is a dupe'),
438L: (u'this is a dupe', u'this is a dupe'),
439L: (u'this is a dupe', u'this is a dupe')
}


Basically I'm removing duplicate entries of items in my database, so essentially I want to keep at least one of these in here, and throw the other in a list of duplicates that need to be removed.

Can I do it with this structure or should I be using lists instead?

Answer

You can do this with two nested dictionary comprehensions. The inner one consolidates the duplicates by reversing the key and value, and the outer one rebuilds it in the original form.

>>> {k:v for v,k in {v:k for k,v in potential_duplicates.items()}.items()}
{433L: (u'one two three', u'one two three'), 435L: (u'whole foods', u'whole foods'), 439L: (u'this is a dupe', u'this is a dupe')}

To get a list of the keys that were removed, use a list comprehension to compare the two dicts:

>>> kept = {k:v for v,k in {v:k for k,v in potential_duplicates.items()}.items()}
>>> removed = [k for k in potential_duplicates.keys() if k not in kept]
>>> removed
[432L, 434L, 437L, 438L]