I'm currently working on a project with Ruby/Rails, importing invoices to the database but trying to maximise the efficiency of the processes which is indeed too slow right now.
For an import batch with 100.000 rows it takes around 2.5 3 hours to process and save each record in the database.
//// Ruby code
def pre_process(merchant_prefix, channel_import)
# channel needed to identify invoices so an import from another channel cannot collude if they had same merchant_prefix
Jzbackend::Invoice.where(merchant_prefix: merchant_prefix, channel: channel_import.channel).delete_all
# get rid of all previous import patches which becomes empty after delete_import_strategy
Jzbackend::Import.where.not(id: channel_import.id).where(channel: channel_import.channel).destroy_all
def process_row(row, channel_import)
debt_claim = Jzbackend::Invoice.new
debt_claim.import = channel_import
debt_claim.status = 'pending'
debt_claim.channel = channel_import.channel
debt_claim.merchant_prefix = row
debt_claim.debt_claim_number = row
debt_claim.amount = Monetize.parse(row)
debt_claim.print_date = row
debt_claim.first_name = row.try(:, 4)
debt_claim.last_name = row.try(:, 5)
debt_claim.address = row.try(:, 6)
debt_claim.postal_code = row.try(:, 7)
debt_claim.city = row.try(:, 8)
First rule of mass imports: batch, batch, batch.
You're saving each row separately. This incurs HUGE overhead. Say, the insert itself takes 1ms, but the roundtrip to the database is 5ms. Total time used - 6ms. For 1000 records that's 6000ms or 6 seconds.
Now imagine that you use a mass insert, where you send data for multiple rows in the same statement. It looks like this:
INSERT INTO users (name, age) VALUES ('Joe', 20), ('Moe', 22), ('Bob', 33'), ...
Let's say, you send data for 1000 rows in this one request. The request itself takes 1000ms (but in reality it'll likely be considerably quicker too, less overhead on parsing the query, preparing the execution plan, etc.). Total time taken is 1000ms + 5ms. At least 6x reduction! (in real projects of mine, I was observing 100x-200x reduction).