Ahmedn1 Ahmedn1 - 24 days ago 8
Python Question

Memory management in Django

I'm doing some analysis on my Django database. I do many queries in a loop and some of these queries may return big results.

So, after a while the whole 8 GB of RAM on my EC2 instance is eaten and I cannot even ssh to the machine any longer.

I have to reboot the instance then start over again.

I tried the solution mentioned here:

https://baxeico.wordpress.com/2014/09/30/optimize-django-memory-usage/
But the

queryset_iterator
method seems not to work with aggregated queries.

I'm pretty sure that any single query cannot consume all 8 GB of RAM. So, this means that the old results are not deleted from memory.

How do I force a query out of the memory before the end of its loop iteration and before executing the next query?

Here is my code:

def get_users_event_distribution(monthYear, event_type=None):
title = event_type if (event_type) else 'All'

filename = 'charts/%s_%s_event_dist.png'%(monthYear, title)
filename = filename.replace(' ', '')

if os.path.isfile(filename):
print 'Chart already in file %s'%(filename)
else:
users = None
if event_type:
users = EVENT.objects.filter(time__month=monthYear.month, time__year=monthYear.year, event_type=event_type).values_list('user').annotate(count=Count('id'))
else:
users = EVENT.objects.filter(time__month=monthYear.month, time__year=monthYear.year).values_list('user').annotate(count=Count('id'))

uc = users.count()
print 'We have %d users'%(uc)

print 'Building Count Dictionary'
count_dict = dict()
for u in users:
try:
count_dict[u[1]] += 1
except:
count_dict[u[1]] = 1
count += 1

print 'Built the count dictionary with %d keys'%(len(count_dict.keys()))

fig, ax = plt.subplots(figsize=(20, 20))
bars = plt.bar(range(len(count_dict)), count_dict.values(),
align='edge')
locs, labels = plt.xticks(range(len(count_dict)), count_dict.keys())
ax.set_ylabel('# Users')
ax.set_xlabel('# %s Events' % (title))
ax.set_title('%s Event Distribution'%(title))
ax.relim()
# update ax.viewLim using the new dataLim
ax.autoscale_view()

def autolabel(rects):
"""
Attach a text label above each bar displaying its height
"""
for rect in rects:
height = rect.get_height()
ax.text(rect.get_x() + rect.get_width() / 2., 1.05 * height,
'%d' % int(height),
ha='center', va='bottom')

autolabel(bars)
plt.savefig(filename, bbox_inches='tight', dpi=100)
print 'saved the distribution chart to %s'%(filename)

def get_users_all_event_distribution(monthYear):
get_users_event_distribution(monthYear)
for event_type in [event_type[0] for event_type in EVENT_TYPE]:
get_users_event_distribution(monthYear, transaction_type)


I run
get_users_all_event_distribution
for different dates in a loop.

Answer Source

With more analysis, I found out that the problem was in matplot figures as stated here in this warning:

/usr/local/lib64/python2.7/site-packages/matplotlib/pyplot.py:524: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (matplotlib.pyplot.figure) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam figure.max_open_warning).
max_open_warning, RuntimeWarning)

I add the plt.close('all') line.