railsr railsr - 5 years ago 104
Ruby Question

Grabbing JSON data from API with multi-threaded requests

I'm using httpartyI for making requests and currently have the following code:

def scr(users)
users.times do |id|
test_url = "siteurl/#{id}"
Thread.new do
response = HTTParty.get(test_url)

open('users.json', 'a') do |f|
f.puts "#{response.to_json}, "
end
p "added"
end
end
sleep
end


It works OK for 100-300 records.

I tried adding
Thread.exit
after
sleep
, but if I set users to something like 200000, after a while my terminal throws an error. I don't remember what it was but it's something about threads... resource is busy but some records. (About 10000 were added successfully.)

It looks like I'm doing it wrong and need to somehow break requests to batches.

Answer Source

On quick inspection, the problem would seem to be that you have a race condition with regards to your JSON file. Even if you don't get an error, you'll definitely get corrupted data.

The simplest solution is probably just to do all the writing at the end:

  def scr(users)
   threads = []
   users.times do |id|
     test_url =  "siteurl/#{id}"
     threads << Thread.new do     
       response = HTTParty.get(test_url)
       response.to_json
     end
   end
   all_values = threads.map {|t| t.value}.join(', ')
   open('users.json', 'a') do |f|
    f.puts all_values
   end
  end

Wasn't able to test that, but it should do the trick. It's also better in general to be using Thread#join or Thread#value instead of sleep.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download