Nick Nick - 1 month ago 10
Ruby Question

How to get the most recent rows in a group

I have a Rails 4.2.5.x project running PostGres. I have a table with a similar structure to this:

id, contact_id, date, domain, f1, f2, f3, etc
1, ABC, 01-01-16, abc.com, 1, 2, 3, ...
2, ABC, 01-01-15, abc.com, 1, 2, 3, ...
3, ABC, 01-01-14, abc.com, 1, 2, 3, ...
4, DEF, 01-01-15, abc.com, 1, 2, 3, ...
5, DEF, 01-01-14, abc.com, 1, 2, 3, ...
6, GHI, 01-11-16, abc.com, 1, 2, 3, ...
7, GHI, 01-01-16, abc.com, 1, 2, 3, ...
8, GHI, 01-01-15, abc.com, 1, 2, 3, ...
9, GHI, 01-01-14, abc.com, 1, 2, 3, ...
...
...
99, ZZZ, 01-01-16, xyz.com, 1, 2, 3, ...


I need to query to find:


  • The most recent rows by
    date

  • filtered by domain

  • for a distinct
    contact_id
    (grouped by?)

  • row-limited result. In this example, I'm not adding this complication but this needs to be factored in. If there are 50 distinct contacts, I am only interested in the top 3 by date.

  • ID is the primary key.

  • there are indexes on the other columns

  • the
    fX
    columns indicate other data in the model that is needed (such as contact email, for example).



In MySQL, this would be a simple
SELECT * FROM table WHERE domain='abc.com' GROUP BY contact_id ORDER BY date DESC
, however, PostGres complains, in this case, that:

ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "table.id" must appear in the GROUP BY clause or be used in an aggregate function


I expect to get back 3 rows; 1, 4 and 6. Ideally, I'd like to get back the full rows in a single query... but I accept that I may need to do one query to get the IDs first, then another to
find
the items I want.

This is the closest I have got:

ExampleContacts
.select(:contact_id, 'max(date) AS max_date')
.where(domain: 'abc.com')
.group(:contact_id)
.order('max_date desc')
.limit(3)


However... this returns the
contact_id
, not the
id
. I cannot add the ID for the row.

EDIT:

Essentially, I need to get the primary key back for the row which is grouped on the non-primary key and sorted by another field.

Answer

If you want the rows, you don't need grouping. It's simply Contact.select('DISTINCT ON (contact_id)').where(domain: 'abc.com').order(date: :desc).limit(3)