ram ram - 2 months ago 5x
Python Question

How tuples, lists and dictionaries are stored in backend in python

How tuples, lists and dictionaries are stored in backend in python?
Why cannot we change tuples once assigned? How it works in background?


How they're stored in the backend is different for each Python implementation.

But how they're stored has almost nothing to do with why you can't change tuples. You can't change tuples because they were designed to be immutable. They don't implement a __setitem__ method or an append method. In ABC terms (which is almost always what you actually care about, not the underlying implementation), they implement only Sequence, not MutableSequence. Under the covers, on most implementations, they aren't much different from lists, except that they may not know how to expand; the difference is really just that the mutating methods are not implemented.

In CPython:

Tuples and lists are a small header and a pointer to a contiguous array of pointers to PyObject. If you append to a list when it's already filled the array, it allocates a new, larger array, copies over the existing pointers, then erases the old one.

Dictionaries are a small header plus a pointer to a hash table,* where each bucket is a hash value, a key, and a value (with the keys and values being pointers to PyObject).

The details are documented as part of the C API, in the Concrete Objects Layer. The source code is in the Objects directory. (Notice that at the C API level, you can actually call PyTuple_SetItem, and even _PyTuple_Resize, but you're strongly discouraged from doing so if any Python code may be able to see the tuple.)

* Actually, in the latest versions, the hash table can be split into two parts, so that multiple dictionaries can share the same keys. See PEP 412 and the comments in dictobject.c for details on how that works.

In PyPy, Python list, tuple, and dict objects are actually implemented in RPython (a subset of Python), which is translated to C through the usual PyPy magic. The tuple and list are pretty much what you'd expect, but the dict has some clever tricks that are worth reading.*

* It basically takes the same "split table" idea further—or, more accurately, CPython borrowed PyPy's split table idea and didn't take it as far.

In Jython and IronPython, they're implemented in Java and C#, respectively. IIRC, early versions of Jython directly used Java collection types for Python types, but they stopped that a long time ago and the code is more similar to CPython (just on top of a Java Array instead of a C pointer-to-a-malloc-region). The implementation is pretty similar to CPython in the 2.5-2.7 days. I assume IronPython is similar, but I haven't looked.

In both of the partial Python-in-JavaScript implementations I know of, all three types are implemented on top of JS Objects, but they have to do some ugly stuff with the keys for dict.