nomis6432 - 1 year ago 84
Python Question

# Best way to count duplicates between 2 lists

I've seen some posts which cover how to count duplicates from 2 unique lists(without duplicates in the lists themselves) but not a good solution on how to do this for other lists so I came up with some solutions myself but I doubt they are well optimized.

My first version:

``````def version1(lst1,lst2):
lst = list(set(lst1))
return sum(min(lst1.count(lst[i]), lst2.count(lst[i])) for i in range(len(lst)))
``````

second version:

``````def version2(lst1,lst2):
lst = [i for n,i in enumerate(lst1) if i not in lst1[:n]]
return sum(min(lst1.count(lst[i]), lst2.count(lst[i])) for i in range(len(lst)))
``````

third version:

``````def version3(lst1,lst2):
lst = list(dict.fromkeys(lst1).keys())
return sum(min(lst1.count(lst[i]), lst2.count(lst[i])) for i in range(len(lst)))
``````

speed test:

``````import random

lst1 = [random.randint(0,101) for i in range(0,10000)]
lst2 = [random.randint(0,101) for i in range(0,10000)]

from timeit import default_timer as timer

print("first result:",end=" ")
START = timer()
print(version1(lst1,lst2))
END = timer()
print("first time:",END-START)

print("second result:", end=" ")
START = timer()
print(version2(lst1,lst2))
END = timer()
print("second time:",END-START)

print("third result:", end=" ")
START = timer()
print(version3(lst1,lst2))
END = timer()
print("third time:",END-START)
``````

Version 2 scores always below 1 and 3 but version 1 scores better with lists with a lot of duplicates within the list itself and 3 better with lists without a lot of duplicates within the list itself.

I was wondering if someone knows a better way to do this.

You can use `collections.Counter` which support `&` operation:

``````>>> from collections import Counter
>>> Counter([1,2,2,4]) & Counter([2,2,3,4]) # {2:2, 1:1, 4:1} AND {2:2, 3:1, 4:1}
Counter({2: 2, 4: 1})
>>> sum(_.values())
3
``````

``````from collections import Counter
def another(lst1, lst2):
return sum((Counter(lst1) & Counter(lst2)).values())
``````

UPDATE

Here's the modified `version1`. You don't need to convert `set` back to `list` to access items using indexes; Just iterate items:

``````def version1_modified(lst1, lst2):
return sum(min(lst1.count(x), lst2.count(x)) for x in set(lst1))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download