I am working in Python (2.7.9) and am trying to filter a list of tuples by a list of elements of those tuples. In particular, my objects have the following form:
tuples = [('a', ['a1', 'a2']), ('b',['b1', 'b2']), ('c',['c1', 'c2'])]
filter = ['a', 'c']
tuples_filtered = [(x,y) for (x,y) in tuples if x in filter]
tuples_filtered = [('a', ['a1', 'a2']), ('c',['c1', 'c2'])]
In a comment you write:
The filter list contains 30,000 words and the list of tuples contains about 134,000 2-tuples.
in containment tests against a list takes O(N) linear time, which is slow when you do this 134k times. Each time you have to iterate over all those elements to find a match. Given that you are filtering, not all those first elements are going to be present in the 30k list, so you are executing up to 30k * 134k == 4 billion comparisons.
Use a set instead:
filter_set = set(filter)
Set containment tests are O(1) constant time; now you reduced your problem to 134k tests.
A much smaller component of time you can avoid spending is the tuple assignment; use indexing to extract just the one element you are testing with:
tuples_filtered = [tup for tup in tuples if tup in filter_set]