Paul Millar Paul Millar - 1 year ago 28
Ruby Question

Rails / Postgres Lookup Performance

I have a status dashboard that shows the status of remote hardware devices that 'ping' the application every minute and log their status.

class Sensor < ActiveRecord::Base
has_many :logs

def most_recent_log
logs.order("id DESC").first

class Log < ActiveRecord::Base
belongs_to :sensor

Given I'm only interested in showing the current status, the dashboard only shows the most recent log for all sensors. This application has been running for a long time now and there are tens of millions of

The problem I have is that the dashboard takes around 8 seconds to load. From what I can tell, this is largely because there is an N+1 Query fetching these logs.

Completed 200 OK in 4729.5ms (Views: 4246.3ms | ActiveRecord: 480.5ms)

I do have the following index in place:

add_index "logs", ["sensor_id", "id"], :name => "index_logs_on_sensor_id_and_id", :order => {"id"=>:desc}

My controller / lookup code is the following:

class SensorsController < ApplicationController
def index
@sensors = Sensor.all

  1. How do I make the load time reasonable?

  2. Is there a way to avoid the N+1 and reload this?

I had thought of putting a
reference on to
and then updating this every time a new log for that sensor is posted - but something in my head is telling me that other developers would say this is a bad thing. Is this the case?

How are problems like this usually solved?


There are 2 relatively easy ways to do this:

  • Use ActiveRecord eager loading to pull in just the most recent logs
  • Roll your own mini eager loading system (as a Hash) for just this purpose

Basic ActiveRecord approach:

subquery ="MAX('id')")
@sensors = Sensor.eager_load(:logs).where(logs: {id: subquery}).all

Note that you should NOT use your most_recent_log method for each sensor (that will trigger an N+1), but rather logs.first. Only the latest log for each sensor will actually be prefetched in the logs collection.

Rolling your own may be more efficient from a SQL perspective, but more complex to read and use:

@sensors = Sensor.all
logs = Log.where(id:"MAX('id')"))
@sensor_logs = logs.each_with_object({}){|log, hash|
  hash[log.sensor_id] = log

@sensor_logs is a Hash, permitting a fast lookup for the latest log by

Regarding your comment about storing the latest log id - you are essentially asking if you should build a cache. The answer would be 'it depends'. There are many advantages and many disadvantages to caching, so it comes down to 'is the benefit worth the cost'. From what you are describing, it doesn't appear that you are familiar with the difficulties they introduce (Google 'cache invalidation') or if they are applicable in your case. I'd recommend against it until you can demonstrate that a) it is adding real value over a non-cache solution, and b) it can be safely applied for your scenario.