Christos Baziotis - 1 year ago 52
Python Question

# Populate dictionary from list in loop

I have the following code that works fine and I was wondering how to implement the same logic using list comprehension.

def get_features(document, feature_space):
features = {}
for w in feature_space:
features[w] = (w in document)
return features


Also am I going to get any improvements in performance by using a list comprehension?

The thing is that both
feature_space
and
document
are relatively big and many iterations will run.

Edit: Sorry for not making it clear at first, both
feature_space
and
document
are lists.

• document
is a list of words (a word may exist more than once!)

• feature_space
is a list of labels (features)

Like this, with a dict comprehension:

def get_features(document, feature_space):
return {w: (w in document) for w in feature_space}


The features[key] = value expression becomes the key: value part at the start, and the rest of the for loop(s) and any if statements follow in nesting order.

Yes, this will give you a performance boost, because you've now removed all features local name lookups and the dict.__setitem__ calls.

Note that you need to make sure that document is a data structure that has fast membership tests. If it is a list, convert it to a set() first, for example, to ensure that membership tests take O(1) (constant) time, not the O(n) linear time of a list:

def get_features(document, feature_space):
document = set(document)
return {w: (w in document) for w in feature_space}


With a set, this is now a O(K) loop instead of a O(KN) loop (where N is the size of document, K the size of feature_space).