Matt Fisher Matt Fisher - 2 months ago 12
Python Question

Python - Remove any element from a list of strings that is a substring of another element

So starting with a list of strings, as below


string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']


I want to remove any element from the list that is a substring of another element, giving the result for instance...


string_list = ['resting', 'looked', 'spit']


I have some code that acheives this but it's embarrassingly ugly and probably needlessly complex. Is there a simple way to do this in Python?

Answer

First building block: substring.

You can use in to check:

>>> 'rest' in 'resting'
True
>>> 'sing' in 'resting'
False

Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.

def substringSieve(string_list):
    out = []
    for s in string_list:
        if not any([s in r for r in string_list if s != r]):
            out.append(s)
    return out

You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):

def substringSieve(string_list):
    string_list.sort(key=lambda s: len(s), reverse=True)
    out = []
    for s in string_list:
        if not any([s in o for o in out]):
            out.append(s)
    return out