Toussaint Louverture Toussaint Louverture - 1 month ago 6
Python Question

Remove a string that is a substring of other string in the list WITHOUT changing original order of the list

I have a list.

the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?']


I'd like to remove "Donald Trump has" from
the_list
because it's a substring of other list element.

Here is an important part. I want to do this without distoring the order of the original list.

The function I have (below) distorts the order of the original list. Because it sorts the list items by its length first.

def substr_sieve(list_of_strings):
dups_removed = list_of_strings[:]
for i in xrange(len(list_of_strings)):
list_of_strings.sort(key = lambda s: len(s))
j=0
j=i+1
while j <= len(list_of_strings)-1:
if list_of_strings[i] in list_of_strings[j]:
try:
dups_removed.remove(list_of_strings[i])
except:
pass
j+=1
return dups_removed

cco cco
Answer

You can do this without sorting:

the_list = ['Donald Trump has', "I've heard Donald Trump has small fingers",
            'What is going on?']

def winnow(a_list):
    keep = set()
    for item in a_list:
        if not any(item in other for other in a_list if item != other):
            keep.add(item)
    return [ item for item in a_list if item in keep ]

winnow(the_list)

Sorting may allow fewer comparisons overall, but that seems highly data-dependent, and could be a premature optimization.

Comments