Brutalized Brutalized - 2 months ago 11
Python Question

remove duplicate values from items in a dictionary in Python

How can I check and remove duplicate values from items in a dictionary?
I have a large data set so I'm looking for an efficient method. The following is an example of values in a dictionary that contains a duplicate:

'word': [('769817', [6]), ('769819', [4, 10]), ('769819', [4, 10])]


needs to become

'word': [('769817', [6]), ('769819', [4, 10])]

Answer

This problem essentially boils down to removing duplicates from a list of unhashable types, for which converting to a set does not possible.

One possible method is to check for membership in the current value while building up a new list value.

d = {'word': [('769817', [6]), ('769819', [4, 10]), ('769819', [4, 10])]}
for k, v in d.items():
    new_list = []
    for item in v:
        if item not in new_list:
            new_list.append(item)
    d[k] = new_list

Alternatively, use groupby() for a more concise answer, although slower (the list must be sorted first).

import itertools

d = {'word': [('769817', [6]), ('769819', [4, 10]), ('769819', [4, 10])]}
for k, v in d.items():
    v.sort()
    d[k] = list(item for item, _ in itertools.groupby(v))

Output -> {'word': [('769817', [6]), ('769819', [4, 10])]}

Comments