durga durga - 1 month ago 20
Python Question

python: list lookup vs dict lookup

I wrote a piece of code where the list size increases with each iteration and iterations can reach almost 100000 times.

sample :

def do_something():
Lst.append(k)

while N < 100000:
if k not in Lst:
do_something()


now, I noticed that this method took ages to finish . Please note, I did set setrecursionlimit() . Infact embarrassing enough, the program kept running for hrs.

later, while trying to find methods to optimise the code, i converted the Lst to a Dct. So the code looked something like:

def do_something():
Dct[k] = True

while N < 100000:
if Dct[k] == False:
do_something()


the code ran drastically faster. After reading few conversations on SOF(Repeatedly appending to a large list (Python 2.6.6)) , I realised its not list which is slow, but rather how memory is handled for larger list data. This website https://wiki.python.org/moin/TimeComplexity , sheds light on list and dict look-up time complexity. for list its O(n) where as a Dct lookup is O(1). Is it why Dct performs better ? How exaclty are list lookup and Dict lookup performed?

Answer Source

Yes, dictionary lookups take constant time. Your if k not in Lst may have to scan the whole list to see if the number is not yet in the list, before appending. It is this scan that makes list containment tests take O(n) time, and is what is killing your algorithm.

A python dictionary on the other hand, uses a hash table to test for membership. Each key is hashed (reduced to a number), with the number then being turned into in index into the table. If the key found at that location is equal to the key you are testing with, a match is found. Hashing can lead to collisions (two values hashing to the same table index), but the Python dictionary implementation has an algorithm to then look for a next slot in an efficient manner. If an empty slot is found, the containment test has failed, the key is not present.

So, to test if k is in your dictionary, for most tests just 1 calculation is needed. For some, a few more tests could be required. But on average, the lookup time is constant.

If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.

You way want to look at the set() type; this is, like a dictionary, an unordered collection with O(1) membership tests, but just the values, no keys:

some_set = set()

def do_something():
    some_set.add(k)

while N < 100000:
    if k not in some_set:
        do_something()

Internally, set() objects use a hash table as well.