Toussaint Louverture Toussaint Louverture - 1 year ago 45
Python Question

Remove a string that is a substring of other string in the list WITHOUT changing original order of the list

I have a list.

the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?']

I'd like to remove "Donald Trump has" from
because it's a substring of other list element.

Here is an important part. I want to do this without distoring the order of the original list.

The function I have (below) distorts the order of the original list. Because it sorts the list items by its length first.

def substr_sieve(list_of_strings):
dups_removed = list_of_strings[:]
for i in xrange(len(list_of_strings)):
list_of_strings.sort(key = lambda s: len(s))
while j <= len(list_of_strings)-1:
if list_of_strings[i] in list_of_strings[j]:
return dups_removed

cco cco
Answer Source

You can do this without sorting:

the_list = ['Donald Trump has', "I've heard Donald Trump has small fingers",
            'What is going on?']

def winnow(a_list):
    keep = set()
    for item in a_list:
        if not any(item in other for other in a_list if item != other):
    return [ item for item in a_list if item in keep ]


Sorting may allow fewer comparisons overall, but that seems highly data-dependent, and could be a premature optimization.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download