Dragos Dragos - 5 months ago 7
Ruby Question

Ruby chain two 'group_by' methods

I have an array of objects that looks like this:

[
{day: 'Monday', class: 1, name: 'X'},
{day: 'Monday', class: 2, name: 'Y'},
{day: 'Tuesday', class: 1, name: 'Z'},
{day: 'Monday', class: 1, name: 'T'}
]


I want to group them by days, and then by classes i.e.

groupedArray['Monday'] => {'1' => [{name: 'X'}, {name: 'T'}], '2' => [{name: 'Y'}]}


I've seen there is a

group_by { |a| [a.day, a.class]}


But this creates a hash with a [day, class] key.

Is there a way I can achieve this, without having to group them first by day, and then iterate through each day, and group them by class, then pushing them into a new hash?

Answer
arr = [
  {day: 'Monday',  class: 1, name: 'X'},
  {day: 'Monday',  class: 2, name: 'Y'},
  {day: 'Tuesday', class: 1, name: 'Z'},
  {day: 'Monday',  class: 1, name: 'T'}
]

One way of obtaining the desired hash is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. Here that is done twice, first when values of :day are the same, then for each such occurrence, when the values of :class are the same (for a given value of :day).

arr.each_with_object({}) { |g,h|
  h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] }) { |_,h1,h2|
    h1.update(h2) { |_,p,q| p+q } } }
  #=> {"Monday" =>{"1"=>[{:name=>"X"}, {:name=>"T"}], "2"=>[{:name=>"Y"}]},
  #    "Tuesday"=>{"1"=>[{:name=>"Z"}]}} 

The steps are as follows.

enum = arr.each_with_object({})
  #=> #<Enumerator: [{:day=>"Monday",  :class=>1, :name=>"X"},
  #                  {:day=>"Monday",  :class=>2, :name=>"Y"},
  #                  {:day=>"Tuesday", :class=>1, :name=>"Z"},
  #                  {:day=>"Monday",  :class=>1, :name=>"T"}]:each_with_object({})> 

We can see the values that will be generated by this enumerator by converting it to an array:

enum.to_a
  #=> [[{:day=>"Monday",  :class=>1, :name=>"X"}, {}],
  #    [{:day=>"Monday",  :class=>2, :name=>"Y"}, {}],
  #    [{:day=>"Tuesday", :class=>1, :name=>"Z"}, {}],
  #    [{:day=>"Monday",  :class=>1, :name=>"T"}, {}]] 

The empty hash in each array is the hash being built and returned. It is initially empty, but will be partially formed as each element of enum is processed.

The first element of enum is passed to the block (by Enumerator#each) and the block variables are assigned using parallel assignment (somtimes called multiple assignment):

g,h = enum.next
  #=> [{:day=>"Monday", :class=>1, :name=>"X"}, {}] 
g #=> {:day=>"Monday", :class=>1, :name=>"X"} 
h #=> {} 

We now perform the block calculation:

h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
  #=> {}.update("Monday"=>{ "1"=>[{name: "X"}] })
  #=> {"Monday"=>{"1"=>[{:name=>"X"}]}}

This operation returns the updated value of h, the hash being constructed.

Note that update's argument

"Monday"=>{ "1"=>[{name: "X"}] }

is shorthand for

{ "Monday"=>{ "1"=>[{name: "X"}] } }

Because the key "Monday" was not present in both hashes being merged (h had no keys), the block

{ |_,h1,h2| h1.update(h2) { |_,p,q| p+q } } }

was not used to determine the value of "Monday".

Now the next value of enum is passed to the block and the block variables are assigned:

g,h = enum.next
  #=> [{:day=>"Monday", :class=>2, :name=>"Y"},
  #    {"Monday"=>{"1"=>[{:name=>"X"}]}}] 
g #=> {:day=>"Monday", :class=>2, :name=>"Y"} 
h #=> {"Monday"=>{"1"=>[{:name=>"X"}]}}

Note that h was updated. We now perform the block calculation:

h.update(g[:day]=>{ g[:class].to_s=>[{name: g[:name] }] })
  # {"Monday"=>{"1"=>[{:name=>"X"}]}}.update("Monday"=>{ "2"=>[{name: "Y"}] })

Both hashes being merged share the key "Monday". We therefore must use the block to determine the merged value of "Monday":

{ |k,h1,h2| h1.update(h2) { |m,p,q| p+q } } }
  #=> {"1"=>[{:name=>"X"}]}.update("2"=>[{name: "Y"}])
  #=> {"1"=>[{:name=>"X"}], "2"=>[{:name=>"Y"}]} 

See the doc for update for an explanation of the block variables k, h1 and h2 for the outer update and m, p and q for the inner update. k and m are the values of the common key. As they are not used in the block calculations, I have replaced them with underscores, which is common practice.

So now:

h #=> { "Monday" => { "1"=>[{ :name=>"X" }], "2"=>[{ :name=>"Y"}] } }

Prior to this operation the hash h["Monday] did not yet have a key 2, so the second update did not require use of the block

{ |_,p,q| p+q }

This block is used, however, when the last element of enum is merged into h, since the values of both :day and :class are the same for the two hashes being merged.

The remaining calculations are similar.

Comments