Reck Reck - 1 month ago 14
Ruby Question

How can I measure mutex contention in Ruby?

I recently found myself trying to diagnose why a particular Ruby program was running slow. In the end it turned out be caused by a scaling issue causing a lot of contention on a particular mutex.

I was wondering if there are any tools that I could have used to make this issue easier to diagnose? I know I could have used ruby-prof to get detailed output of what all 100+ threads of this program were spending their time on, but I'm curious whether there is any tool that is specifically focused on just measuring mutex contention in Ruby?

Answer

So If figured out how to do this with DTrace.

Given a Ruby program like this:

# mutex.rb
mutex = Mutex.new
threads = []

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 2
    end
  end
end

threads << Thread.new do
  loop do
    mutex.synchronize do
      sleep 4
    end
  end
end

threads.each(&:join)

We can use a DTrace script like this:

/* mutex.d */
ruby$target:::cmethod-entry
/copyinstr(arg0) == "Mutex" && copyinstr(arg1) == "synchronize"/
{
  self->file = copyinstr(arg2);
  self->line = arg3;
}

pid$target:ruby:rb_mutex_lock:entry
/self->file != NULL && self->line != NULL/
{
  self->mutex_wait_start = timestamp;
}

pid$target:ruby:rb_mutex_lock:return
/self->file != NULL && self->line != NULL/
{
  mutex_wait_ms = (timestamp - self->mutex_wait_start) / 1000;
  printf("Thread %d acquires mutex %d after %d ms - %s:%d\n", tid, arg1, mutex_wait_ms, self->file, self->line);
  self->file = NULL;
  self->line = NULL;
}

When we run this script against the Ruby program, we then get information like this:

$ sudo dtrace -q -s mutex.d -c 'ruby mutex.rb'

Thread 286592 acquires mutex 4313316240 after 2 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4004183 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004170 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 6 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286592 acquires mutex 4313316240 after 4 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 16012158 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2002593 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4001983 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004418 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4000407 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 2004163 ms - mutex.rb:14
Thread 286591 acquires mutex 4313316240 after 4003191 ms - mutex.rb:6
Thread 286591 acquires mutex 4313316240 after 2 ms - mutex.rb:6
Thread 286592 acquires mutex 4313316240 after 4005587 ms - mutex.rb:14
...

We can collect this output and use it to derive information about which mutexes are causing the most contention.