Clint Clint - 4 days ago 5
Python Question

__sizeof__ str is larger than __sizeof__ a tuple containing that string

The following code produces the given output.

import sys

print('ex1:')
ex1 = 'Hello'
print('\t', ex1.__sizeof__())

print('\nex2:')g
ex2 = ('Hello', 53)
print('\t', ex2.__sizeof__())


Output:

ex1:
54
ex2:
40


Why does
__sizeof__()
print a smaller result when a second element is considered? Shouldn't the output be larger? I realize from this answer that I should be using
sys.getsizeof()
, but the behavior seems odd nonetheless. I'm using
Python 3.5.2
.

Also, as @Herbert pointed out,
'Hello'
takes up more memory than
('Hello',)
, which is a
tuple
. Why is this?

Answer

This is due to the fact that tuples assess their size not by including the actual sizes of their respective contents but, rather, by calculating the size of pointers to PyObjects times the elements they contain. That is, they hold pointers to the (generic) PyObjects contained and that's what contributes to its size.

In PyTupleType, a struct where the information on the tuple type is contained, we see that the tp_itemsize field has sizeof(PyObject *) as its value:

PyTypeObject PyTuple_Type = {
    PyVarObject_HEAD_INIT(&PyType_Type, 0)
    "tuple",
    sizeof(PyTupleObject) - sizeof(PyObject *),
    sizeof(PyObject *),  // <-- sizeof pointer to PyObject's

I'm not certain about 32bit builds, but on 64bit builds of Python, this is sizeof(PyObject *) == 8 bytes.

This is the value that is going to be multiplied by the number of items contained in the tuple instance. When we look at object_size, the __sizeof__ method that tuples inherit from object (examine object.__sizeof__ is tuple.__sizeof__), we see this clearly:

static PyObject *
object_sizeof(PyObject *self, PyObject *args)
{
    Py_ssize_t res, isize;

    res = 0;
    isize = self->ob_type->tp_itemsize;
    if (isize > 0)
        res = Py_SIZE(self) * isize;  // <-- num_elements * tp_itemsize
    res += self->ob_type->tp_basicsize;

    return PyLong_FromSsize_t(res);
}

see how isize (obtained from tp_itemsize) is multiplied by Py_SIZE(self), which, is another macro that grabs the ob_size value indicating the number of elements inside the tuple.

This is why, even if we created a somewhat large string inside the tuple:

t = ("Hello" * 2 ** 10,)
t[0].__sizeof__()         # 5169

the size of the tuple:

t.__sizeof__()            # 32

equals that of one with simply "Hello" inside:

t2 = ("Hello",)
t[0].__sizeof__()         # 54
t2.__sizeof__()           # 32 Tuple size stays the same.

For strings, each individual character increases the value returned from str.__sizeof__. This, along with the fact that tuples only store pointers, gives a misleading impression that "Hello" has a larger size than the tuple containing it.

Just for completeness, unicode__sizeof__ is the one that computes this. It really just multiplies the length of the string with the character size (which depends on what kind the character is 1, 2 and 4 byte chars).

The only thing I'm not getting with tuples is why tb_basicsize is listed as sizeof(PyTupleObject) - sizeof(PyObject *), removes 8 bytes from the overall size returned.

Comments