user1362058 user1362058 - 22 days ago 7
Python Question

Efficiently check if one string contain all character of another

I have two strings: one a word and one a scramble of letters. I want to see if this scramble of letters holds enough letters to spell the word. I have come up with an algorithm to do this but it isn't efficient enough and I was hoping I could get some help making it faster.

Here's what I have so far:

s1 = 'hypochondriac'
s2 = 'yqhpwoewnqlchpijcdrxpoa'

temp = list(s1)
for X in s2:
for Y in temp:
if X == Y:
temp.remove(X)
X = '@'
if temp == []:
print('Found ', s1)


I have an issue where once X matches I need to increment X but I didn't know how so I just take it out of the equation by making it the at symbol. I tried using break but it doesn't reach far enough to break the to the s2 loop. Either way, I'm pretty sure this double for loop idea is super slow compared to what someone with some experience would use. Any ideas?

Answer

Your code is not efficient, no, because you iterate in a double loop. For each letter in s1, in the worst-case scenario (no matches) you loop over all of s2.

Use a Counter object instead; these act as multi-sets, where you can both test if a character is present in O(1) time and manage remaining counts:

from collections import Counter

def contains(s1, s2):
    s2set = Counter(s2)
    for c in s1:
        count = s2set[c]
        if not c:
            return False
        if count == 1:
            del s2set[c]
        else:
            s2set[c] = count - 1
    return True

You can also turn s1 into a multi-set, and check if the multi-set for s2 contains enough letters for each entry:

def contains(s1, s2):
    s1set = Counter(s1)
    s2set = Counter(s2)
    for c, count in s1set.items():
        if count > s2set[c]:
            return False
    return True

The latter can be reduced further with the all() function, which returns False early if any of the results it is passed is False, True otherwise:

def contains(s1, s2):
    s2set = Counter(s2)
    return all(count <= s2set[c] for c, count in Counter(s1).items())

In all these you only have to iterate over both s1 and s2 once (either directly or to produce the multi-set).

Demo of the latter:

>>> from collections import Counter
>>> def contains(s1, s2):
...     s2set = Counter(s2)
...     return all(count <= s2set[c] for c, count in Counter(s1).items())
...
>>> s1 = 'hypochondriac'
>>> s2 = 'yqhpwoewnqlchpijcdrxpoa'
>>> contains(s1, s2)
True
>>> contains(s1 + 'b', s2)
False
Comments