Clint Clint - 14 days ago 5
Python Question

__sizeof__() string is larger than __sizeof__() tuple containing that string

The following code produces the given output.

import sys

print('ex1:')
ex1 = 'Hello'
print('\t', ex1.__sizeof__())

print('\nex2:')g
ex2 = ('Hello', 53)
print('\t', ex2.__sizeof__())


Output:

ex1:
54
ex2:
40


Why does
__sizeof__()
print a smaller result when a second element is considered? Shouldn't the output be larger? I realize from this answer that I should be using
sys.getsizeof()
, but the behavior seems odd nonetheless. I'm using
Python 3.5.2
.

As @Herbert pointed out,
'Hello'
takes up more memory than
('Hello',)
, which is a
tuple
. Why is this?

Answer

This is due to the fact that tuples assess their size not by including the actual sizes of their respective contents but, rather, by including the size of PyObject * pointers times the elements they contain. That is, they hold pointers to the elements contained and that's what's evaluated.

In the PyTupleType struct, where the information on the tuple type is contained, we see that the tp_itemsize field has sizeof(PyObject *) as the value:

PyTypeObject PyTuple_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "tuple",
    sizeof(PyTupleObject) - sizeof(PyObject *),
    sizeof(PyObject *),  // <-- sizeof pointer to PyObject's

This is the value that is going to be multiplied by the number of items contained in the tuple instance. When we look at object_size, the __sizeof__ method that tuples inherit from object, we see this clearly:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;  // <-- num_elements * tp_itemsize
    res += self->ob_type->tp_basicsize;

    return PyLong_FromSsize_t(res);
}

see how isize (obtained from tp_itemsize) is multiplied by Py_SIZE(self), which, is another macro that grabs the ob_size value indicating the number of elements inside the tuple.

This is why, even if we created a somewhat large string inside the tuple:

t = ("Hello" * 2 ** 10,)
t[0].__sizeof__()         # 5169

the size of the tuple:

t.__sizeof__()            # 32

equals that of one with simply "Hello" inside:

t2 = ("Hello",)
t[0].__sizeof__()         # 54
t2.__sizeof__()           # 32 Tuple size stays the same.

For strings, each individual character increases the value returned from str.__sizeof__. This, along with the fact that tuples only store pointers, gives a misleading impression that "Hello" has a larger size than the tuple containing it.

Just for completeness, unicode__sizeof__ is the one that computes this. It really just multiplies the length of the string with the character size (which depends on what kind the character is 1, 2 and 4 byte chars).

The only thing I'm not getting with tuples is why tb_basicsize is listed as sizeof(PyTupleObject) - sizeof(PyObject *), removes 8 bytes from the overall size returned.