Pietro Speroni Pietro Speroni - 4 days ago 7
Python Question

Fastest way to update a dictionary in python

I have a dictionary A, and a possible entry foo. I know that A[foo] should be equal to x, but I don't know if A[foo] has been already defined. In any case if A[foo] has been defined it means that it already has the correct value.

It is faster to execute:

if foo not in A.keys():
A[foo]=x


or simply update

A[foo]=x


because by the time the computer has found the foo entry, it can as well update it. While if not I would have to call the hash table two times?

Thanks.

Answer

Just add items to the dictionary without checking for their existence. I added 100,000 items to a dictionary using 3 different methods and timed it with the timeit module.

  1. if k not in d: d[k] = v
  2. d.setdefault(k, v)
  3. d[k] = v

Option 3 was the fastest, but not by much.

[ Actually, I also tried if k not in d.keys(): d[k] = v, but that was slower by a factor of 300 (each iteration built a list of keys and performed a linear search). It made my tests so slow that I left it out here. ]

Here's my code:

import timeit

setup = """
import random
random.seed(0)
item_count = 100000
# divide key range by 5 to ensure lots of duplicates 
items = [(random.randint(0, item_count/5), 0) for i in xrange(item_count)]
"""
in_dict = """
d = {}
for k, v in items:
    if k not in d:
        d[k] = v
"""
set_default = """
d = {}
for k, v in items:
    d.setdefault(k, v)
"""
straight_add = """
d = {}
for k, v in items:
    d[k] = v
"""
print 'in_dict      ', timeit.Timer(in_dict, setup).timeit(1000)
print 'set_default  ', timeit.Timer(set_default, setup).timeit(1000)
print 'straight_add ', timeit.Timer(straight_add, setup).timeit(1000)

And the results:

in_dict       13.090878085
set_default   21.1309413091
straight_add  11.4781760635

Note: This is all pretty pointless. We get many questions daily about what's the fastest way to do x or y in Python. In most cases, it is clear that the question was being asked before any performance issues were encountered. My advice? Focus on writing the clearest program you can write and if it's too slow, profile it and optimize where needed. In my experience, I almost never get to to profile and optimize step. From the description of the problem, it seems as if dictionary storage will not be the major bottle-neck in your program.

Comments