del del - 2 months ago 13
Python Question

Debugging reference counting memory leaks in Python C extension modules

I'm trying to determine if there are any reference counting memory leaks in a Python C extension module. Consider this very simple test extension that leaks a

date
object:

#include <Python.h>
#include <datetime.h>

static PyObject* memleak(PyObject *self, PyObject *args) {
PyDate_FromDate(2000, 1, 1); /* deliberately create a memory leak */
Py_RETURN_NONE;
}

static PyMethodDef memleak_methods[] = {
{"memleak", memleak, METH_NOARGS, "Leak some memory"},
{NULL, NULL, 0, NULL} /* Sentinel */
};

PyMODINIT_FUNC initmemleak(void) {
PyDateTime_IMPORT;
Py_InitModule("memleak", memleak_methods);
}


PyDate_FromDate creates a new reference (i.e. internally calls Py_INCREF) and since I never call Py_DECREF, this object will never get garbage collected.

However, when I call this function, the number of objects being tracked by the garbage collector doesn't seem to change before and after the function call:

Python 2.7.3 (default, Apr 10 2013, 05:13:16)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from memleak import memleak
>>> import gc
>>> gc.disable()
>>> gc.collect()
0
>>> len(gc.get_objects()) # get object count before
3581
>>> memleak()
>>> gc.collect()
0
>>> len(gc.get_objects()) # get object count after
3581


And I can't seem to find the leaked
date
object at all in the list of objects returned by
gc.get_objects()
:

>>> from datetime import date
>>> print [obj for obj in gc.get_objects() if isinstance(obj, date)]
[]


Am I missing something here about how
gc.get_objects()
works? Is there another way to demonstrate that the memleak() function has a memory leak?

Answer

From the documentation of the gc module:

Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.

So the gc module is used only to deal with references cycles. In your case there is no cycle, hence the date object isn't returned by the get_objects function.

In fact old versions of python did not have the garbage collector at all, they only used reference-counting. The garbage collector was introduced to avoid creating memory leaks with reference-cycles(since this can be done from the python side pretty easily, and you do not want that a pure-python programs create memory leaks).

To see that kind of memory leak you should call the memleak function in a loop and see that the memory used increases (slowly in your case).

There are also some 3rd party libraries that can be used to profile memory usage, see the Which Python memory profiler is recommended? question on SO.