Nathan Long Nathan Long - 1 year ago 42
Ruby Question

How does Ruby's sort_by {rand} work?

I think this is a great Ruby one-liner:

someArray.sort_by {rand}

It's concise, it's readable, and it works - but I don't quite understand how. Here's what I know:

  1. rand
    evaluates to a number between 0 and 1 (like 0.783468632804653)

  2. rand
    is being repeatedly evaluated in the code above, because assigning it to
    first breaks the random sort

  3. sort_by {0.783468632804653}
    , or any other number I tried, has no effect on the array wasn't much help to me in this case.

Can someone explain this step-by-step?


I've been using Ruby longer now, and I see that I was missing a concept or two here. The key thing is that:

  1. rand
    is a method (defined on Kernel); it generates a random number

  2. {rand}
    is a block, which
    keeps, calling it each time it wants to compare two items in the collection. If the collection is a bunch of objects representing countries, it needs to be able to grab two of them and determine which one comes first. Do you put the one with the longest name first? The one with the largest land mass? The block should answer that question by returning a value that says "you asked about Spain vs Cameroon, and I say Cameroon comes first." (You could do that with

The rest of how
works is explained in the documentation. I'm still not quite sure why returning a random number works at all - presumably
rounds it to -1, 0, or 1, whichever is closest? But in any case, getting a different random number every time you call the block is quite different from getting the same number every time. When
says "which of these two countries comes first?",
puts on a blindfold, turns around 10 times, points and says "that one!" :)

Answer Source

In Ruby 1.8/1.9 both sort and sort_by are implemented in C, this is a rough equivalent of how this works:

Say you start with [1,2,3,4] and call sort_by{rand}:

  1. (I invented some random numbers):

    An array of tuples is created: [[0.12232, 1],[0.53434, 2],[0.333, 3],[0.99, 4]]

    In roughly equivalent Ruby code this is: [1,2,3,4].map{|x| [rand, x]}

  2. Ruby's quick sort is performed on the array based off the first element: (note the internal implementation is far from trivial and contains a ton of optimisations for already ordered arrays and such)

    [[0.12232, 1],[0.333, 3],[0.53434, 2],[0.99, 4]]

    In rough Ruby this step is: ary.sort{|x,y| x[0] <=> y[0]}

  3. Pointers are copied from the new sorted array, to the correct position in the original array.


    In rough Ruby this step is:{|x,y| y}

This technique is sometimes referred to as a "Schwartzian Transform". Caching means that the expensive operation is not performed more than N times. Meaning, this is a very efficient way of randomizing an array.

Note: array.shuffle! will be the most efficient built-in way to shuffle an array (in-place) since it uses a modern version of Fisher-Yates:

static VALUE
rb_ary_shuffle_bang(VALUE ary)
    long i = RARRAY_LEN(ary);

    while (i) {
  long j = rb_genrand_real()*i;
  VALUE tmp = RARRAY_PTR(ary)[--i];
  RARRAY_PTR(ary)[i] = RARRAY_PTR(ary)[j];
  RARRAY_PTR(ary)[j] = tmp;
    return ary;