Vigneshwaren Vigneshwaren - 1 month ago 6
Python Question

Understanding memory allocation for large integers in Python

How does Python allocate memory for large integers?

An

int
type has a size of
28 bytes
and as I keep increasing the value of the
int
, the size increases in increments of
4 bytes
.


  1. Why
    28 bytes
    initially for any value as low as
    1
    ?

  2. Why increments of
    4 bytes
    ?



PS: I am running Python 3.5.2 on a x86_64 (64 bit machine). Any pointers/resources/PEPs on how the (3.0+) interpreters work on such huge numbers is what I am looking for.

Code illustrating the sizes:

>>> a=1
>>> print(a.__sizeof__())
28
>>> a=1024
>>> print(a.__sizeof__())
28
>>> a=1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024
>>> print(a.__sizeof__())
32
>>> a=1024*1024*1024*1024*1024*1024
>>> a
1152921504606846976
>>> print(a.__sizeof__())
36

Answer

Why 28 bytes initially for any value as low as 1?

I believe @bgusach answered that completely; Python uses a struct to represent objects, including longs:

struct _longobject {
    PyObject_VAR_HEAD
    digit ob_digit[1];
};

PyObject_VAR_HEAD is just a special macro that adds another field and ob_digits holds the value for the number. Boiler-place in size comes from that struct, for small and for large Python numbers.

Why increments of 4 bytes?

Because the size is calculated based on the sizeof(digit) when a larger number is requested, you can see that in _PyLong_New:

/* Number of bytes needed is: offsetof(PyLongObject, ob_digit) +
   sizeof(digit)*size.  Previous incarnations of this code used
   sizeof(PyVarObject) instead of the offsetof, but this risks being
   incorrect in the presence of padding between the PyVarObject header
   and the digits. */
if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
    PyErr_SetString(PyExc_OverflowError,
                    "too many digits in integer");
    return NULL;
}
result = PyObject_MALLOC(offsetof(PyLongObject, ob_digit) +
                         size*sizeof(digit));

digit is defined in the header file holding the struct _longobject as a typedef for uint32:

typedef uint32_t digit;

and sizeof(uint32_t) is 4 bytes, the amount which you'll see increase when size increases.


Of course, this is just how CPython has chosen to implement it; it is an implementation detail and as such you wont find much information in a PEPs (the python-dev mailing list would hold implementation discussions if you can find the appropriate thread).

Either way, you might find differing behavior in other popular implementations, so don't take this one for granted.