deltap deltap - 5 months ago 17
Python Question

Using a list comprehension to label data that is common to two lists

I have two lists, A and B. I want to generate a third list that is 1 if the corresponding entry in A has an entry in the list B at the end of the string and 0 otherwise.

A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']
B = ['Smith', 'Stirling', 'Doe']

I want a list comprehension that will give the result

[0, 1, 0, 0, 0]

Keep in mind that this is a specific case of a more general problem. Elements in A can have arbitrary white space and contain an arbitrary number of words in them. Likewise elements in B can have an arbitrary number of words. For example

A = [' Tom Barry Stirling Adam', 'Maddox Smith', 'George Washington Howard Smith']
B = ['Washington Howard Smith', 'Stirling Adam']

should return

[1, 0, 1]

So far I have the following

[1 if y.endswith(x) else 0 for x in B for y in A]

However the length of the returned list is not the dimension that I want because it gives a 0 or 1 for every combination of A[i], B[j] elements. I am not interested in solutions using for loops, I need a list comprehension for speed.


A much faster way is to pass a tuple to endswith:

In [8]: A = ['Mary Sue', 'John Doe', 'Alice Stella', 'James May', 'Susie May']

In [9]: B = ['Smith', 'Stirling', 'Doe']            

In [10]: A *= 1000

In [11]: %%timeit                                                          
t = tuple(B)
[int(s.endswith(t)) for s in A]
100 loops, best of 3: 5.02 ms per loop

In [12]: timeit [int(any(full.endswith(last) for last in B)) for full in A]
100 loops, best of 3: 21.3 ms per loop

You make one function call per element in A as opposed to one function call for potentially every element in B for each in A and without the overhead of the generator used with any.