deltap - 1 year ago 77
Python Question

# Using a list comprehension to label data that is common to two lists

I have two lists, A and B. I want to generate a third list that is 1 if the corresponding entry in A has an entry in the list B at the end of the string and 0 otherwise.

``````A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']
B = ['Smith', 'Stirling', 'Doe']
``````

I want a list comprehension that will give the result

``````[0, 1, 0, 0, 0]
``````

Keep in mind that this is a specific case of a more general problem. Elements in A can have arbitrary white space and contain an arbitrary number of words in them. Likewise elements in B can have an arbitrary number of words. For example

``````A = ['  Tom Barry Stirling Adam', 'Maddox Smith', 'George Washington Howard Smith']
B = ['Washington Howard Smith', 'Stirling Adam']
``````

should return

``````[1, 0, 1]
``````

So far I have the following

``````[1 if y.endswith(x) else 0 for x in B for y in A]
``````

However the length of the returned list is not the dimension that I want because it gives a 0 or 1 for every combination of A[i], B[j] elements. I am not interested in solutions using for loops, I need a list comprehension for speed.

A much faster way is to pass a tuple to endswith:

``````In [8]: A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']

In [9]: B = ['Smith', 'Stirling', 'Doe']

In [10]: A *= 1000

In [11]: %%timeit
t = tuple(B)
[int(s.endswith(t)) for s in A]
....:
100 loops, best of 3: 5.02 ms per loop

In [12]: timeit [int(any(full.endswith(last) for last in B)) for full in A]
100 loops, best of 3: 21.3 ms per loop
``````

You make one function call per element in `A` as opposed to one function call for potentially every element in B for each in `A` and without the overhead of the generator used with any.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download