light2yellow light2yellow - 4 months ago 8
Python Question

Efficient way to compare two lists remembering the origin for every unique element

Given this sample, which finds unique elements while being available to tell the element source

source_list = ["one", "two", "three", "four", "five"]
diff_list = ["zero", "one", "two", "three", "four", "six", "seven"]

source_unique = []
diff_unique = []

for entry in source_list:
if entry not in diff_list:
source_unique.append(entry)

for entry in diff_list:
if entry not in source_list:
diff_unique.append(entry)

print("Unique elements in source_list: {0}".format(source_unique))
print("Unique elements in diff_list: {0}".format(diff_unique))

###
# Unique elements in source_list: ['five']
# Unique elements in diff_list: ['zero', 'six', 'seven']


is there a more efficient way to do this instead of using two additional lists and all that stuff? The main task is to be able to tell the elements' origin.

Jim Jim
Answer

By using sets and taking their difference which has a complexity of O(len(set_value)):

>>> s1, s2 = set(source_list), set(diff_list)
>>> s1.difference(s2)
{'five'}
>>> s2.difference(s1)
{'seven', 'six', 'zero'}

in this case, you might need to transform to a list afterwards, if necessary.

Or, you could do the same thing by using a list comprehension and making source_list and diff_list sets for fast membership testing:

For the uniques list:

source_unique = [v1 for v1 in source_list if v1 not in set(diff_list)]
source_unique
['five']

For the diff_unique list:

diff_unique = [v1 for v1 in diff_list if v1 not in set(source_list)]
diff_unique
['zero', 'six', 'seven']

Which is again O(len(list)) unless I'm mistaking my time complexities.