I have a task which needs to be run on 'most' objects in my database once every some period of time (once a day, once a week, whatever). Basically this means that I have some query that looks like this running in it's own thread.
for model_instance in SomeModel.objects.all():
So what I actually ended up doing is building something that you can 'wrap' a QuerySet in. It works by making a deepcopy of the QuerySet, using the slice syntax--e.g.,
some_queryset[15:45]--but then it makes another deepcopy of the original QuerySet when the slice has been completely iterated through. This means that only the set of Objects returned in 'this' particular slice are stored in memory.
class MemorySavingQuerysetIterator(object): def __init__(self,queryset,max_obj_num=1000): self._base_queryset = queryset self._generator = self._setup() self.max_obj_num = max_obj_num def _setup(self): for i in xrange(0,self._base_queryset.count(),self.max_obj_num): # By making a copy of of the queryset and using that to actually access # the objects we ensure that there are only `max_obj_num` objects in # memory at any given time smaller_queryset = copy.deepcopy(self._base_queryset)[i:i+self.max_obj_num] logger.debug('Grabbing next %s objects from DB' % self.max_obj_num) for obj in smaller_queryset.iterator(): yield obj def __iter__(self): return self def next(self): return self._generator.next()
So instead of...
for obj in SomeObject.objects.filter(foo='bar'): <-- Something that returns *a lot* of Objects do_something(obj);
You would do...
for obj in MemorySavingQuerysetIterator(in SomeObject.objects.filter(foo='bar')): do_something(obj);
Please note that the intention of this is to save memory in your Python interpreter. It essentially does this by making more database queries. Usually people are trying to do the exact opposite of that--i.e., minimize database queries as much as possible without regards to memory usage. Hopefully somebody will find this useful though.