x-yuri x-yuri - 1 month ago 7
Ruby Question

How to debug requests taking too much time to complete?

XHR requests randomly take to much time to complete. And I fail to find a place where this is happening. If there is one. When I enable profiler around suspicious ruby code blocks, the hold-up is happening elsewhere. I couldn't reproduce it with

webrick
however. Any ideas?

UPD It's a rails application using sequel to connect to postgresql. Here are more details on the issue I'm facing.

Answer

Here's what I did:

1) added the following code to the beginning of /usr/lib/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:

def __l *args
  File.open('/home/USER/' + Process.pid.to_s + '.log', 'a') { |f| f.puts *args }
end

require 'ruby-prof'
def __start_profiler
    RubyProf.start
    if false
        require 'profiler'
        $old_compile_option = RubyVM::InstructionSequence.compile_option.select { |k, v|
          [:trace_instruction, :specialized_instruction].include? k
        }
        RubyVM::InstructionSequence.compile_option = {
          :trace_instruction => true,
          :specialized_instruction => false
        }
        Profiler__::start_profile
    end
end

def __get_profiler_output
    sio = StringIO.new
    result = RubyProf.stop
    # printer = RubyProf::GraphPrinter.new(result)
    printer = RubyProf::FlatPrinter.new(result)
    printer.print(sio)
    return sio.string
    if false
        Profiler__::print_profile(sio)
        RubyVM::InstructionSequence.compile_option = $old_compile_option
        sio.string
    end
end

2) added the following code at the beginning of process_action method:

__start = Time.now
__l '-' * 80, __start, env['REQUEST_URI'], env['HTTP_X_REAL_IP']
__start_profiler

3) put result of big begin...end block at the end of the method into r variable

4) added the following code at the end of the method:

__r = __get_profiler_output
if Time.now - __start > 10
    __l 'profiler'
    __l __r
    __l 'profiler'
end
__l 'elapsed: %g: %s' % [Time.now - __start, env['REQUEST_URI']], '-' * 80
r

5) before each test ran:

rm -f ~/*.log && touch tmp/restart.txt && watch 'grep elapsed ~/*.log | sort -gr -k2 | head'

And was able to find the culprit:

 %self      total      self      wait     child     calls  name
 99.92     65.713    65.713     0.000     0.000        5   PG::Connection#async_exec
  0.00      0.002     0.002     0.000     0.000      264   Set#delete
...
Comments