Clint Clint - 12 days ago 5
Python Question

Python __sizeof__() string is larger than __sizeof__() tuple containing that string

The following code produces the given output.

import sys

print('ex1:')
ex1 = ('Hello')
print('\t', ex1.__sizeof__())

print('\nex2:')g
ex2 = ('Hello', 53)
print('\t', ex2.__sizeof__())


Output:

ex1:
54

ex2:
40


Why does
__sizeof__()
print a smaller result when a second element is considered? Shouldn't the output be larger? I realize from this answer that I should be using
sys.getsizeof()
, but the behavior seems odd nonetheless. I'm using
python 3.5.2
.

As @Herbert pointed out,
('Hello')
takes up more memory than
('Hello',)
, which is a tuple. Why is this?

Answer

This is mostly due to the fact that tuples assess their size not by including the sizes of their respective elements but, rather, by including the size of PyObject * pointers times the elements they contain.

In the PyTupleType struct, where the information on tuple objects is contained, we can see that tb_itemsize field has sizeof(PyObject *):

PyTypeObject PyTuple_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "tuple",
    sizeof(PyTupleObject) - sizeof(PyObject *),
    sizeof(PyObject *),  // <-- pointer to PyObject's

This is the value that is going to be multiplied by the number of items contained in the tuple instance. We can see this when we look at object_size, the __sizeof__ method that tuples inherit from object:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;
    res += self->ob_type->tp_basicsize;

    return PyLong_FromSsize_t(res);
}

see how isize (obtained from tp_itemsize) is multiplied by Py_SIZE(self) (another macro that grabs ob_size that the number of elements for the tuple) which is a macro for obtaining the number of elements inside the tuple.

This is why, even if we create a large string inside the tuple:

t = ("Hello" * 2 ** 10,)
t[0].__sizeof__()  # 5169

the size of the tuple stays the same as one with simply "Hello" inside:

t.__sizeof__()  # 32
t2 = ("Hello",)
t2.__sizeof__() # 32

For strings, each individual character increases the number returned from str.__sizeof__ so this gives the false impression that "Hello" has a larger size than the tuple containing it.

Just for completeness, unicode__sizeof__ is the one that computes this and it really just multiplies the length of the string with the character size (which depends on what kind the character is 1, 2 and 4 byte chars).

Comments