Edgar Aroutiounian Edgar Aroutiounian - 2 months ago 11
Python Question

Why is the order in Python dictionaries and sets arbitrary?

I don't understand how looping over a dictionary or set in python is done by 'arbitrary' order.

I mean, it's a programming language so everything in the language must be 100% determined, correct? Python must have some kind of algorithm that decides which part of the dictionary or set is chosen, 1st, second and so on.

What am I missing?

Answer

The order is not arbitrary, but depends on the insertion and deletion history of the dictionary or set, as well as on the specific Python implementation. For the remainder of this answer, for 'dictionary', you can also read 'set'; sets are implemented as dictionaries with just keys and no values.

Keys are hashed, and hash values are assigned to slots in a dynamic table (it can grow or shrink based on needs). And that mapping process can lead to collisions, meaning that a key will have to be slotted in a next slot based on what is already there.

Listing the contents loops over the slots, and so keys are listed in the order they currently reside in the table.

Take the keys 'foo' and 'bar', for example, and lets assume the table size is 8 slots. In Python 2.7, hash('foo') is -4177197833195190597, hash('bar') is 327024216814240868. Modulo 8, that means these two keys are slotted in slots 3 and 4 then:

>>> hash('foo')
-4177197833195190597
>>> hash('foo') % 8
3
>>> hash('bar')
327024216814240868
>>> hash('bar') % 8
4

This informs their listing order:

>>> {'bar': None, 'foo': None}
{'foo': None, 'bar': None}

All slots except 3 and 4 are empty, looping over the table first lists slot 3, then slot 4, so 'foo' is listed before 'bar'.

bar and baz, however, have hash values that are exactly 8 apart and thus map to the exact same slot, 4:

>>> hash('bar')
327024216814240868
>>> hash('baz')
327024216814240876
>>> hash('bar') % 8
4
>>> hash('baz') % 8
4

Their order now depends on which key was slotted first; the second key will have to be moved to a next slot:

>>> {'baz': None, 'bar': None}
{'bar': None, 'baz': None}
>>> {'bar': None, 'baz': None}
{'baz': None, 'bar': None}

The table order differs here, because one or the other key was slotted first.

The technical name for the underlying structure used by CPython (the most commonly used Python implemenation) is a hash table, one that uses open addressing. If you are curious, and understand C well enough, take a look at the C implementation for all the (well documented) details. You could also watch this Pycon 2010 presentation by Brandon Rhodes about how CPython dict works, or pick up a copy of Beautiful Code, which includes a chapter on the implementation written by Andrew Kuchling.

Note that as of Python 3.3, a random hash seed is used as well, making hash collisions unpredictable to prevent certain types of denial of service (where an attacker renders a Python server unresponsive by causing mass hash collisions). This means that the order of a given dictionary is then also dependent on the random hash seed for the current Python invocation.

Other implementations are free to use a different structure for dictionaries, as long as they satisfy the documented Python interface for them, but I believe that all implementations so far use a variation of the hash table.

Python 3.6 introduces a new dict implementation that maintains insertion order, and is faster and more memory efficient to boot. Rather than keep a large sparse table where each row references the stored hash value, and the key and value objects, the new implementation adds a smaller hash array that only references indices in dense table (one that only contains as many rows as there are actual key-value pairs), and it is the dense table that happens to list the contained items in order.

Python 2.7 and newer also provides an OrderedDict class, a subclass of dict that adds an additional data structure to record key order. At the price of some speed and extra memory, this class remembers in what order you inserted keys; listing keys, values or items will then do so in that order. It uses a doubly-linked list stored in an additional dictionary to keep the order up-to-date efficiently. See the post by Raymond Hettinger outlining the idea.

If you wanted an ordered set, you can install the oset package; it works on Python 2.5 and up.

Comments