Ben Ben - 1 year ago 62
SQL Question

Struggling to understand why joins/includes are needed with Rails when an association exists in the model

I am trying to create a very complicated query and am having troubles with it - so I'm going back to the basics to try and figure out what I'm missing.
I've been reading the

Rails Guides
Active Record Associations
Active Record Query Interface
(specifically section 12 - joins) and I'm failing to understand how they are related and why joins/includes are needed.

The Associations page says "With Active Record associations, we can streamline these - and other - operations by declaratively telling Rails that there is a connection between the two models." Section 12.2 of the Query page says "Active Record lets you use the names of the associations defined on the model as a shortcut for specifying JOIN clauses for those associations when using the joins method."

These two statements seem somewhat at odds with each other to me. If I create as belongs_to association why do I need a join if I'm trying to pull data from both tables? Looking at it another way:

class Customer < ActiveRecord::Base
has_many :orders

class Order < ActiveRecord::Base
belongs_to :customer

If I do
@orders = Order.all
I can output the customer name by doing
. However, if I want to select all orders with 'smith' in the name I would do something like
@orders=Order.where(' ilike "%smith%"').joins(:customer)

How is it that this "relationship" is working in first half, but requires the join in the second half?

Answer Source

You do not need to join, however until you call your association your data will not be loaded.

This is a quality of ActiveRecord::Base called lazy loading.

You can see this in the SQL output from your console.

user = User.find(1)
User Load (0.2ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` = 1 LIMIT 1

This particular user model has over a hundread association.

Why are none being loaded?

Because we haven't called them yet.

Article Load (0.3ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 1

Now we see the query is executed.

In sequence this becomes a problem when working with plain old Ruby.

For example, consider the following:

users.each do |user|
  puts user.articles.first.title

Running the following code is problematic, because every time Ruby iterates over a user, it calls the articles for only that user.

You end up querying for each user repeatedly which performs the following SQL:

Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 1 LIMIT 1
Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 2 LIMIT 1
Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 3 LIMIT 1
Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 4 LIMIT 1
Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 5 LIMIT 1
Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` = 6 LIMIT 1

We can solve this by loading all of our data initially within a single query.

users.joins(:articles).each do |user|
  puts user.articles.first.title

Which will perform the following SQL before the enumeration begins:

Article Load (0.5ms)  SELECT `articles`.* FROM `articles` WHERE `articles`.`user_id` IN(1, 2, 3, 4, 5, 6, etc.)

This is where ActiveRecord::Base methods like includes and joins come into play.

Here are two good articles on the matter:

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download