Ultimate Zero Ultimate Zero - 6 months ago 16
Python Question

Python - Fastest way to check if a string contains specific characters in any of the items in a list

What is the fastest way to check if a string contains some characters from any items of a list?

Currently, I'm using this method:

lestring = "Text123"

lelist = ["Text", "foo", "bar"]

for x in lelist:
if lestring.count(x):
print 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)


Is there any way to do it without iteration (which will make it faster I suppose.)?

Answer

You can try list comprehension with membership check

>>> lestring = "Text123"
>>> lelist = ["Text", "foo", "bar"]
>>> [e for e in lelist if e in lestring]
['Text']

Compared to your implementation, though LC has an implicit loop but its faster as there is no explicit function call as in your case with count

Compared to Joe's implementation, yours is way faster, as the filter function would require to call two functions in a loop, lambda and count

>>> def joe(lelist, lestring):
    return ''.join(random.sample(x + 'b'*len(x), len(x)))

>>> def uz(lelist, lestring):
    for x in lelist:
        if lestring.count(x):
            return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)


>>> def ab(lelist, lestring):
    return [e for e in lelist if e in lestring]

>>> t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
>>> t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
>>> t_joe = timeit.Timer("joe(lelist, lestring)", setup="from __main__ import lelist, lestring, joe")
>>> t_ab.timeit(100000)
0.09391469893125759
>>> t_uz.timeit(100000)
0.1528471407273173
>>> t_joe.timeit(100000)
1.4272649857800843

Jamie's commented solution is slower for shorter string's. Here is the test result

>>> def jamie(lelist, lestring):
    return next(itertools.chain((e for e in lelist if e in lestring), (None,))) is not None

>>> t_jamie = timeit.Timer("jamie(lelist, lestring)", setup="from __main__ import lelist, lestring, jamie")
>>> t_jamie.timeit(100000)
0.22237164127909637

If you need Boolean values, for shorter strings, just modify the above LC expression

[e in lestring for e in lelist if e in lestring]

Or for longer strings, you can do the following

>>> next(e in lestring for e in lelist if e in lestring)
True

or

>>> any(e in lestring for e in lelist)