Tal Weiss Tal Weiss - 2 months ago 11
Python Question

Django Python Garbage Collection woes

After 2 days of debug, I nailed down my time-hog: the Python garbage collector.

My application holds a lot of objects in memory. And it works well.

The GC does the usual rounds (I have not played with the default thresholds of (700, 10, 10)).

Once in a while, in the middle of an important transaction, the 2nd generation sweep kicks in and reviews my ~1.5M generation 2 objects.

This takes 2 seconds!
The nominal transaction takes less than 0.1 seconds.

My question is what should I do?

I can turn off generation 2 sweeps (by setting a very high threshold - is this the right way?) and the GC is obedient.

When should I turn them on?

We implemented a web service using Django, and each user request takes about 0.1 seconds.

Optimally, I will run these GC gen 2 cycles between user API requests. But how do I do that?

My view ends with

return HttpResponse()
, AFTER which I would like to run a gen 2 GC sweep.

How do I do that? Does this approach even make sense?

Can I mark the object that NEVER need to be garbage collected so the GC will not test them every 2nd gen cycle?

How can I configure the GC to run full sweeps when the Django server is relatively idle?

Python 2.6.6 on multiple platforms (Windows / Linux).


I believe one option would be to completely disable garbage collection and then manually collect at the end of a request as suggested here: Garbage Collection

I imagine that you could disable the GC in your settings.py file.

If you want to run GarbageCollection on every request I would suggest developing some Middleware that does it in the process response method:

import gc
class GCMiddleware(object):
    def process_response(self, request, response):
        return response