Chris_Rands Chris_Rands - 1 month ago 13
Python Question

Dictionaries are ordered in Python 3.6

Dictionaries are ordered in Python 3.6, unlike in previous Python incarnations. This seems like a substantial change, but it's only a short paragraph in the documentation. It is described as an implementation detail rather than a language feature, but also implies this may become standard in the future.

How does the Python 3.6 dictionary implementation perform better than the older one while preserving element order? Here is the text from the documentation:


dict()
now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

Answer

How does the Python 3.6 dictionary implementation perform better than the older one while preserving element order?

Essentially by keeping two arrays, one holding the entries for the dictionary in the order that they were inserted and the other holding a list of indices.

In the previous implementation a sparse array of type dictionary entries had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3s full. This is not the case now since only the required entries are stored and a sparse array of type integer 2/3s full is kept.

Obviously creating a sparse array of type "dictionary entries" is much more memory demanding than a sparse array for storing ints (sized 8 bytes tops in cases of really large dictionaries)

In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.


For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it's really required, in the indices.


Should you depend on it?

As noted in the documentation, this is considered an implementation detail meaning it is subject to change and you shouldn't depend on it. If you want to write code that preserves the ordering and want it to not break on previous versions you should always use OrderedDict (which should most likely eventually become a thin-wrapper around the new dict implementation.)

Comments