Leo Brown Leo Brown - 3 months ago 18
SQL Question

Can I save memory by writing SQL instead of ActiveRecord?

Is ActiveRecord beneficial in any way other than its concise, readable syntax? Is it true that ActiveRecord take up more memory than pure SQL? (I'm using PostgreSQL.)

I read in Alexander Dymo's blog post on Rails performance that ActiveRecord takes up more memory than pure SQL:

It's easy to manipulate the data with ActiveRecord. But ActiveRecord
is essentially a wrapper on top of your data. If you have a 1G of data
in the table, ActiveRecord representation of it will take 2G and, in
some cases, more. Yes, in 90% of cases that overhead is justified by
extra convenience that you get. But sometimes you don't need it.

I also read in the documentation that ActiveRecord is "better":

If you're used to using raw SQL to find database records, then you
will generally find that there are better ways to carry out the same
operations in Rails. Active Record insulates you from the need to use
SQL in most cases.

I have been experiencing
Error R14 (Memory quota exceeded)
on Heroku, so in an effort to solve, have learned about bloat and memory leak. I have identified some ways to improve, including loading ActiveRecord data eagerly, as in
, as well as reducing overall calls to the database. But I am still hungry for memory.

If the benefit of ActiveRecord is only that it is easier to write, then I will write the pure SQL where I can. But I am hoping the SO community can comment on the tradeoffs and fill in any gaps in my understanding, especially if I should think twice before moving away from AR.

Would it be Very Bad to get rid of AR altogether?


You need to look at the next paragraph in that blog post to get some context:

One example where you can avoid ActiveRecord's overhead is bulk updates. The code below will neither instantiate any model nor run validations and callbacks.

Book.where('title LIKE ?', '%Rails%').update_all(author: 'David')

The author isn't saying that you should use raw SQL to pull data (as a bunch of hashes) into Rails instead of using ActiveRecord to pull data into Rails (as a bunch of models), they're saying that you shouldn't pull a bunch of data out of the data, manipulate it in Rails, and then put it back when you can push all that work right into the database.

Consider the difference between:

Book.where('title LIKE ?', '%Rails%').update_all(author: 'David')


Book.where('title LIKE ?', '%Rails%').each do |b|
  b.update(author: 'David')

The first one simply sends a bit of SQL into the database:

update books
set author = 'David'
where title like '%Rails%'

and lets the database do all the work. Barely any data moves between the database and your Rails app and your Rails app barely uses any memory for this.

The second one pulls a bunch of rows out of the database, creates a bunch of ActiveRecord models, runs a bunch of Ruby code (create the model, set a value, run validations, ...), then sends a bunch of single row SQL UPDATEs back to the database. This version transfers more data over the network, uses more memory in Rails, and uses more CPU to run more Ruby code.

The message isn't "don't use ActiveRecord", the message is to put your data manipulation logic in the right place rather than dogmatically doing everything in Rails because you heard something about "no logic in the database" somewhere.

Memory problems with ActiveRecord usually come from instantiating too many models. If you're saying Model.all, you're probably doing it wrong. If you're trying to manipulate more than a couple models at once in Rails, you're probably doing it wrong. ORMs are great for working with individual models, they're terrible for bulk operations.

If you drop down to section 2.2 of that blog post:

Sometimes the task at hand is better done with other tools. Most commonly it's a database. Why? Because Ruby is bad at processing large data sets. Like, very very bad. Remember, Ruby takes a large memory footprint. So, for example, to process 1G of data you might need 3G and more of memory. It will take several dozen seconds to garbage collect 3G. Good database can process the data in under a second. Let me show a few examples.

and you'll see the blog's author saying the same thing.