arooo arooo - 23 days ago 7
Python Question

Check word against very large list

If I have a list of 10,000 words, what's an optimized way to check if a word is in that list that won't slow the app down to a crawl?

Should I load the words in from a file and check against that?

def check_for_word(word):
HUGE_LIST = [...] # 10,000 Words
if word in HUGE_LIST:
return True
else:
return False

Answer

Convert a list to set - strings are hashable so set can be easily created.

Look-up in set is O(1), where for list it's O(n), where n is a length of list.

HUGE_SET = set(HUGE_LIST)   # or frozenset, if it's constant and words won't be added to it
return word in HUGE_SET

Also, consider moving creation of huge list and huge set outside of function body. Right now list is recreated every time function is called.

List timings:

$ python -m timeit -s "words = list(map(str, xrange(10000)))" -n 10000 "'5000' in words"
10000 loops, best of 3: 58.2 usec per loop

Frozenset timings:

$ python -m timeit -s "words = frozenset(map(str, xrange(10000)))" -n 10000 "'5000' in words"
10000 loops, best of 3: 0.0504 usec per loop