Andrius Andrius - 2 months ago 7
Python Question

Python - separate duplicate objects into different list

So let say I have this class:

class Spam(object):
def __init__(self, a):
self.a = a


And now I have these objects:

s1 = Spam((1, 1, 1, 4))

s2 = Spam((1, 2, 1, 4))

s3 = Spam((1, 2, 1, 4))

s4 = Spam((2, 2, 1, 4))

s5 = Spam((2, 1, 1, 8))

s6 = Spam((2, 1, 1, 8))

objects = [s1, s2, s3, s4, s5, s6]


so after running some kind of method, I need to have two lists that have objects that had same
a
attribute value in one list and the other objects that had unique
a
attribute.

Like this:

dups = [s2, s3, s5, s6]
normal = [s1, s4]


So it is something like getting duplicates, but in addition it should also add even first occurrence of object that shares same
a
attribute value.

I have written this method and it seems to be working, but it is quite ugly in my opinion (and probably not very optimal).

def eggs(objects):
vals = []
dups = []
normal = []
for obj in objects:
if obj.a in vals:
dups.append(obj)
else:
normal.append(obj)
vals.append(obj.a)
dups_vals = [o.a for o in dups]
# separate again
new_normal = []
for n in normal:
if n.a in dups_vals:
dups.append(n)
else:
new_normal.append(n)
return dups, new_normal


Can anyone write more appropriate pythonic approach for such problem?

Answer

I would group together the objects in a dictionary, using the a attribute as the key. Then I would separate them by the size of the groups.

import collections

def separate_dupes(seq, key_func):
    d = collections.defaultdict(list)
    for item in seq:
        d[key_func(item)].append(item)
    dupes   = [item for v in d.values() for item in v if len(v) > 1]
    uniques = [item for v in d.values() for item in v if len(v) == 1]
    return dupes, uniques

class Spam(object):
    def __init__(self, a):
        self.a = a
    #this method is not necessary for the solution, just for displaying the results nicely
    def __repr__(self):
        return "Spam({})".format(self.a)

s1 = Spam((1, 1, 1, 4))
s2 = Spam((1, 2, 1, 4))
s3 = Spam((1, 2, 1, 4))
s4 = Spam((2, 2, 1, 4))
s5 = Spam((2, 1, 1, 8))
s6 = Spam((2, 1, 1, 8))
objects = [s1, s2, s3, s4, s5, s6]

dupes, uniques = separate_dupes(objects, lambda item: item.a)
print(dupes)
print(uniques)

Result:

[Spam((2, 1, 1, 8)), Spam((2, 1, 1, 8)), Spam((1, 2, 1, 4)), Spam((1, 2, 1, 4))]
[Spam((1, 1, 1, 4)), Spam((2, 2, 1, 4))]