Jb_Eyd Jb_Eyd - 24 days ago 5
Python Question

Most efficient way to compare words in list / dict in Python

I have the following sentence and dict :

sentence = "I love Obama and David Card, two great people. I live in a boat"

dico = {
'dict1':['is','the','boat','tree'],
'dict2':['apple','blue','red'],
'dict3':['why','Obama','Card','two'],
}


I want to match the number of the elements that are in the sentence and in a given dict. The heavier method consists in doing the following procedure:

classe_sentence = []
text_splited = sentence.split(" ")
dic_keys = dico.keys()
for key_dics in dic_keys:
for values in dico[key_dics]:
if values in text_splited:
classe_sentence.append(key_dics)

from collections import Counter
Counter(classe_sentence)


Which gives the following output:

Counter({'dict1': 1, 'dict3': 2})


However it's not efficient at all since there are two loops and it is raw comparaison. I was wondering if there is a faster way to do that. Maybe using
itertools
object. Any idea ?

Thanks in advance !

Answer

You can use the set data data type for all you comparisons, and the set.intersection method to get the number of matches.

It will increare algorithm efficiency, but it will only count each word once, even if it shows up in several places in the sentence.

sentence = set("I love Obama and David Card, two great people. I live in a boat".split())

dico = {
'dict1':{'is','the','boat','tree'},
'dict2':{'apple','blue','red'},
'dict3':{'why','Obama','Card','two'}
}


results = {}
for key, words in dico.items():
    results[key] = len(words.intersection(sentence))