almost a beginner almost a beginner - 1 month ago 16
MySQL Question

Get the value of last record of every month (Django)

My Model:

class Transaction (models.Model):
transaction_id = models.AutoField(primary_key=True)
net_monthly_transaction = models.DecimalField(max_digits = 10, decimal_places = 2, default=0)
# deposit or withdrawal (withdrawal with negative value)
amount = models.DecimalField(max_digits = 10, decimal_places = 2)
time_stamp = models.DateTimeField(default=datetime.now, blank=True)


def __str__(self): # __unicode__ on Python 2
return str(self.time_stamp) + str(self.amount) + str(self.net_monthly_transaction)


My aim is to get the value of net_monthly_transaction from the last entry in every month.

With help of S.O. I have managed to get this far:

truncate_date = connection.ops.date_trunc_sql('month', 'time_stamp')
lem = Transaction.objects.extra({'month':truncate_date}).values('month').annotate(last_record=Max('time_stamp')).values_list('net_monthly_transaction', flat=True)


The above query is suppose to get the value of net_monthly_transaction from the max time_stamp in every month.

But it doesn't.

If I create three entries for October one after the other:


  1. net_monthly_transaction = 3000

  2. net_monthly_transaction = 4000

  3. net_monthly_transaction = 5000



all 3 values will be returned by the query.

In the other hand:


  1. net_monthly_transaction = 3000

  2. net_monthly_transaction = 2000

  3. net_monthly_transaction = 1000



Then only the value 3000 is returned.

So there is a condition set somewhere based on the size of the net_monthly_transaction. I am bit lost as to how I can solve this.

Can someone please provide some direction.

Thanks in advance.

Answer

I'd solve this by using two querysets (unless the simpler approach below is an option). As long as you don't explicitly evaluate last_entries this will result in a single query when you evaluate transactions.

from django.db.models import Max
from django.db.models.functions import TruncMonth

# Selects last time_stamp for each month
last_entries = (Transaction.objects
    .annotate(tx_month=TruncMonth('time_stamp'))
    .values('tx_month')
    .annotate(last_entry=Max('time_stamp'))
    .values_list('last_entry', flat=True))

# Selects transactions with time_stamps matching last_entries
# ie. last transaction in each month
transactions = Transaction.objects.filter(
    time_stamp__in=last_entries
)

transactions is a normal queryset containing the last Transaction instance in each month. If you want a simple list of net_monthly_transaction values with no other information you can add:

net_values = transactions.values_list(
    'net_monthly_transaction', flat=True
)

One thing to be really careful about is that if two entries have identical time stamps then both of them will appear in the result set.

A simpler approach

If net_monthly_transaction is simply the sum of all amounts for a given month, then you can use something like this instead of the above

from django.db.models import Sum
from django.db.models.functions import TruncMonth

transactions = (Transaction.objects
    .annotate(month=TruncMonth('time_stamp'))
    .values('month')
    .annotate(month_net=Sum('amount')))

Now transactions contains dicts representing the last transaction for each month. Each dict has a month key containing the month and amonth_net key which contains the month's net transactions. As a bonus, you don't have to worry about entries with identical time stamps.

Of course, if net_monthly_transaction is the result of a more complex calculation then this might not be an option.

Your initial approach

Your initial query isn't working for two main reasons.

  1. This kind of query usually relies either on joining the table to itself or on a subquery in the WHERE clause. I don't know of a good way to do either of those with a single queryset using Django's ORM unless you resort to stuffing raw SQL in extra() or something along those lines. But producing a WHERE clause subquery is trivial if you use two querysets the way we did above.

  2. It doesn't make sense to use values_list() in this way

    truncate_date = connection.ops.date_trunc_sql('month', 'time_stamp')
    lem = Transaction.objects.extra({'month':truncate_date}).values('month').annotate(last_record=Max('time_stamp'))
    

    Up to this point it's similar to the last_entries queryset used above. We're selecting unique month values and the last time_stamp for each one of those months.

    When we add .values_list('net_monthly_transaction', flat=True) we're telling the query builder that we only care about the net_monthly_transaction field, so it throws away everything else and produces something like this

    SELECT "transaction"."net_monthly_transaction"
    FROM "transaction"
    GROUP BY "transaction"."net_monthly_transaction"
    

    The GROUP BY is the only thing left from the extra and annotate calls, and even that has been changed so that it doesn't do what we wanted it to do.