a_Fraley a_Fraley - 4 months ago 19
Python Question

Is there a way to modify datetime objects through the Django ORM Query?

We've a Django, Postgresql database that contains objects with:

object_date = models.DateTimeField()


as a field.

We need to count the objects by hour per day, so we need to remove some of the extra time data, for example: minutes, seconds and microseconds.

We can remove the extra time data in python:

query = MyModel.objects.values('object_date')
data = [tweet['tweet_date'].replace(minute=0, second=0, microsecond=0) for tweet in query


Which leaves us with a list containing the date and hour.

My Question: Is there a better, faster, cleaner way to do this in the query itself?

Answer

If you simply want to obtain the dates without the time data, you can use extra to declare calculated fields:

query = MyModel.objects
    .extra(select={
        'object_date_group': 'CAST(object_date AS DATE)',
        'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
    })
    .values('object_date_group', 'object_hour_group')

You don't gain too much from just that, though; the database is now sending you even more data.

However, with these additional fields, you can use aggregation to instantly get the counts you were looking for, by adding one line:

query = MyModel.objects
    .extra(select={
        'object_date_group': 'CAST(object_date AS DATE)',
        'object_hour_group': 'EXTRACT(HOUR FROM object_date)'
    })
    .values('object_date_group', 'object_hour_group')
    .annotate(count=Count('*'))

Alternatively, you could use any valid SQL to combine what I made two fields into one field, by formatting it into a string, for example. The nice thing about doing that, is that you can then use the tuples to construct a Counter for convenient querying (use values_list()).

This query will certainly be more efficient than doing the counting in Python. For a background job that may not be so important, however.

One downside is that this code is not portable; for one, it does not work on SQLite, which you may still be using for testing purposes. In that case, you might save yourself the trouble and write a raw query right away, which will be just as unportable but more readable.

Update

As of 1.10 it is possible to perform this query nicely using expressions, thanks to the addition of TruncHour. Here's a suggestion for how the solution could look:

from collections import Counter
from django.db.models import Count
from django.db.models.functions import TruncHour

counts_by_group = Counter(dict(
    MyModel.objects
        .annotate(object_group=TruncHour('object_date'))
        .values_list('object_group')
        .annotate(count=Count('object_group'))
)) # query with counts_by_group[datetime.datetime(year, month, day, hour)]

It's elegant, efficient and portable. :)

Comments