dan martin dan martin - 3 months ago 12
Python Question

Python - comparing lists of dictionaries using tuples - unexpected behaviour?

I've been attempting to compare two lists of dictionaries, and to find the userid's of new people in list2 that aren't in list1. For example the first list:

list1 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}]


and the second list:

list2 = [{"userid": "13451", "name": "james", "age": "24", "occupation": "doctor"}, {"userid": "94324""name": "john", "age": "33", "occupation": "pilot"}, {"userid": "34892", "name": "daniel", "age": "64", "occupation": "chef"}]


the desired output:

newpeople = ['34892']


This is what I've managed to put together:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

newpeople = [t for t in list2tuple if t not in list1tuple]


This actually seems to be pretty efficient, especially considering the lists I am using might contain over 50,000 dictionaries. However, here's the issue:

If it finds a userid in list2 that indeed isn't in list1, it adds it to newpeople (as desired), but then also adds every other userid that comes afterwards in list2 to newpeople as well.

So, say list2 contains 600 userids and the 500th userid in list2 isn't found anywhere in list1, the first item in newpeople will be the 500th userid (again, as desired), but then followed by the other 100 userids that came after the new one.

This is pretty perplexing to me - I'd greatly appreciate anyone helping me get to the bottom of why this is happening.

Answer

Currently you have set list1tuple and list2tuple as:

list1tuple = ((d["userid"]) for d in list1)
list2tuple = ((d["userid"]) for d in list2)

These are generators, not lists (or tuples), which means they can only be iterated over once, which is causing your problem.

You could change them to be lists:

list1tuple = [d["userid"] for d in list1]
list2tuple = [d["userid"] for d in list2]

which would allow you to iterate over them as many times as you like. But a better solution would be to simply make them sets:

list1tuple = set(d["userid"] for d in list1)
list2tuple = set(d["userid"] for d in list2)

And then take the set difference

newpeople = list2tuple - list1tuple