ironsand ironsand - 3 months ago 20
Ruby Question

How to use postgres AVG() function with order clause from Rails

I have a model

Company
and
Company
has many
DailyData
.
And
DailyData
has columns
volume
and
date


To calculate average volume of recent 10 business days I wrote like:

class Array
def sum
inject(0) { |result, el| result + el }
end

def mean
sum.to_d / size
end
end

company = Company.first
company.daily_data.order(date: :desc).limit(10).pluck(:volume).mean


This code works fine, but I want to use postgres
AVG()
function.

company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)


This code ends up with error:

PG::GroupingError: ERROR: column "daily_data.date" must appear in the GROUP BY clause or be used in an aggregate function


But If I put
.group(:date)
in method chain, the sql reurns multiple results.
How can I get recent 10 volumes average value by using postgresql
AVG()
function?

Answer

An ActiveRecord query like this:

company.daily_data.select('AVG(volume) as average_volume').order(date: :desc)

doesn't really make much sense. avg is an aggregate function in SQL so it needs to operate on a groups of rows. But you're not telling the database how to group the rows, you're telling the database to compute the average volume over the entire table and then order that single value by something that doesn't exist in the final result set.

Throwing a limit in:

company.daily_data
       .select('AVG(volume) as average_volume')
       .order(date: :desc)
       .limit(10)

won't help either because the limit is applied after the order and by then you've already confused the database with your attempted avg(volume).

I'd probably use a derived table if I was doing this in SQL, something like:

select avg(volume) as average_volume
from (
  select volume
  from where_ever...
  where what_ever...
  order by date desc
  limit 10
) dt

The derived table in the FROM clause finds the volumes that you want and then the overall query averages those 10 volumes.

Alternatively, you could use a subquery to grab the rows of interest:

select avg(volume) as average_volume
from where_ever...
where id in (
  select id
  from where_ever...
  where what_ever...
  order by date desc
  limit 10
)

The subquery approach is fairly straight forward to implement with ActiveRecord, something like this:

ten_most_recent = company.daily_data.select(:id).order(:date => :desc).limit(10)
company.daily_data.where(:id => ten_most_recent).average(:volume)

If you throw a to_sql call on the end of the second line you should see something that looks like the subquery SQL.

You can also make the derived table approach work with ActiveRecord but it is a little less natural. There is a from method in ActiveRecord that will take an ActiveRecord query to build the from (select ...) derived table construction but you'll want to be sure to manually name the derived table:

ten_most_recent = company.daily_data.select(:volume).order(:date => :desc).limit(10)
AnyModelAtAll.from(ten_most_recent, 'dt').average('dt.volume')

You have to use a string argument to average and include the dt. prefix to keep ActiveRecord from trying to add its own table name.

Of course you'd hide all this stuff in a method somewhere so that you could hide the details. Perhaps an extension method on the daily_data association:

has_many :daily_data, ... do
  def average_volume(days)
    #...
  end
end

so that you could say things like:

company.daily_data.average_volume(11)
Comments