sealocal sealocal - 2 years ago 103
Ruby Question

How do I optimize an ActiveRecord find_in_batches query?

I'm using Rails 4.0.0 and Ruby 2.0.0. My

(as in blog posts) model is associated with a user with a combination of the user's user_name, first_name, last_name. I'd like to migrate the data so that posts are associated to users by a foreign key, which is the user's id.

I have about 11 million records in the

I'm running the below code to migrate the data, using a rake task on a Linux server. However, my task keeps getting "Killed" by the sever, presumably due to the rake task, specifically the below code, consuming too much memory.

I've found that lowering the
to 20 and increasing
allows the task to run longer, updating more records in total without being Killed, but takes significantly more time.

How can I optimize this code for speed and memory usage?

Post.where(user_id: nil).find_in_batches(batch_size: 1000) do |posts|
puts "*** Updating batch beginning with post #{}..."
sleep(10) # Hopefully, saving some memory usage.
posts.each do |post|
user = User.find_by(user_name: post.user_name, first_name: post.first_name, last_name: post.last_name)
rescue NoMethodError => error # user could be nil, so will raise a NoMethodError
puts "No user found."
puts "*** Finished batch."

z5h z5h
Answer Source

Do all the work in the database which is WAY faster than moving data back and forth.

This can be accomplished with ActiveRecord. Of course PLEASE test this before you unleash it on important data.

  .where(user_id: nil)
  .joins("inner join users on posts.user_name = users.user_name")
  .update_all("posts.user_id =")
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download