Parseltongue Parseltongue - 5 months ago 12
Python Question

Python- Remove all words that contain other words in a list

I have a list populated with words from a dictionary. I want to find a way to remove all words, only considering root words that form at the beginning of the target word.

For example, the word "rodeo" would be removed from the list because it contains the English-valid word "rode." "Typewriter" would be removed because it contains the English-valid word "type." However, the word "snicker" is still valid even if it contains the word "nick" because "nick" is in the middle and not at the beginning of the word.

I was thinking something like this:

for line in wordlist:
if line.find(...) --

but I want that "if" statement to then run through every single word in the list checking to see if its found and, if so, remove itself from the list so that only root words remain. Do I have to create a copy of wordlist to traverse?


I'm assuming that you only have one list from which you want to remove any elements that have prefixes in that same list.

#Important assumption here... wordlist is sorted

base=wordlist[0]                      #consider the first word in the list
for word in wordlist:                 #loop through the entire list checking if
    if not word.startswith(base):     # the word we're considering starts with the base
        print base                    #If not... we have a new base, print the current
        base=word                     #  one and move to this new one
    #else word starts with base
        #don't output word, and go on to the next item in the list
print base                            #finish by printing the last base

EDIT: Added some comments to make the logic more obvious