Armen Avetisyan Armen Avetisyan - 1 month ago 4x
Python Question

How much overhead does python numpy tolist() add?

I am using a python program that uses numpy array as standard data type for arrays. For the heavy computation I pass the arrays to a C++ library. In order to do so, I use pybind. However, I am required to use python

. I do the conversion from
array and

NativeSolver.vector_add(array1.tolist(), array2.tolist(), ...)

How much overhead does this conversion generate? I hope it doesn't create a whole new copy. Numpy reference says:


Return a copy of the array data as a (nested) Python list. Data items
are converted to the nearest compatible Python type.


A lot. For simple built-in types, you can use sys.getsizeof on an object to determine the memory overhead associated with that object (for containers, this does not include the values stored in it, only the pointers used to store them).

So for example, a list of 100 smallish ints (but greater than 256 to avoid small int cache) is (on my 3.5.1 Windows x64 install):

>>> sys.getsizeof([0] * 100) + sys.getsizeof(0) * 100

or about 3 KB of memory required. If those same values were stored in a numpy array of int32s, with no Python objects per number, and no per object pointers, the size would drop to roughly 100 * 4 (plus another few dozen bytes, for the array object overhead itself), somewhere under 500 bytes. The incremental cost for each additional small int is 24 bytes for the object (though it's free if it's in the small int cache for values from -5 to 256 IIRC), and 8 bytes for the storage in the list, 32 bytes total, vs. 4 for the C level type, roughly 8x the storage requirements (and you're still storing the original object too).

If you have enough memory to deal with it, so be it. But otherwise, you might trying looking at a wrapping that lets you pass in buffer protocol supporting objects (numpy.array, array.array on Py3, ctypes arrays populated via memoryview slice assignment, etc.) so conversion to Python level types isn't needed.