JoshK JoshK - 7 months ago 10
Python Question

Eliminating Words Based On Letters

I have a dictionary and an alphabet:

import string
alphabet = list(string.ascii_lowercase)
dictionary = [line.rstrip('\n') for line in open("dictionary.txt")]


In a function, I remove a letter from the alphabet

alphabet.remove(letter)


Now, I want to filter through the dictionary to eliminate words if they contain a letter not in the alphabet.

I tried for loops:

for term in dictionary:
for char in term:
print term, char
if char not in alphabet:
dictionary.remove(term)
break


However, this skips over certain words.
I tried filter:

dictionary = filter(term for term in dictionary for char in term if char not in alphabet)


But I get the error:

SyntaxError: Generator expression must be parenthesized if not sole argument

Answer

You don't want to modify a list (or really any container) while you are iterating over it. This can result in errors where it seems like some items are being skipped. If you make a copy (dictionary[:]), it should work out...

for term in dictionary[:]:
    for char in term:
        print term, char
        if char not in alphabet:
            dictionary.remove(term)
            break

We can probably do better here too ...

alphabet_set = set(alphabet)  # set membership testing is faster than string/list...
new_dictionary = [
    term for term in dictionary
    if all(c in alphabet_set for c in term)]

Also, it's probably wise to avoid the name dictionary for a list instance since dict is actually a builtin type...

Comments