CanadaIT CanadaIT - 9 days ago 5
Ruby Question

When is a block or object that is passed to Hash.new created or run?

I'm going through ruby koans and I am having a little trouble understanding when this code will be run:

hash = Hash.new {|hash, key| hash[key] = [] }


If there are no values in the hash, when does the new array get assigned to a given key in the Hash? Does it happen the first time a hash value is accessed without first assigning it? Please help me understand when exactly default values are created for any given hash key.

Answer

For the benefit of those new to Ruby, I have discussed alternative approaches to the problem, including the one that is the substance of this question.

The task

Suppose you are given an array

arr = [[:dog, "fido"], [:car, "audi"], [:cat, "lucy"], [:dog, "diva"], [:cat, "bo"]]  

and wish to to create the hash

{ :dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"] }

First solution

h = {}
arr.each do |k,v|
  h[k] = [] unless h.key?(k)
  h[k] << v
end
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

This is quite straightforward.

Second solution

More Ruby-like is to write:

h = {}
arr.each { |k,v| (h[k] ||= []) << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

When Ruby sees (h[k] ||= []) << v the first thing she does is expand it to

(h[k] = h[k] || []) << v

If h does not have a key k, h[k] #=> nil, so the expression becomes

(h[k] = nil || []) << v

which becomes

(h[k] = []) << v

so

h[k] #=> [v]

Note that h[k] on the left of equality uses the method Hash#[]=, whereas h[k] on the right employs Hash#[].

This solution requires that none of the hash values equal nil.

Third solution

A third approach is to give the hash a default value. If a hash h does not have a key k, h[k] returns the default value. There are two types of default values.

Passing the default value as an argument to Hash::new

If an empty array is passed as an argument to Hash::new, that value becomes the default value:

a = []
a.object_id
  #=> 70339916855860
g = Hash.new(a)
  #=> {}

g[k] returns [] when h does not have a key k. (The hash is not altered, however.) This construct has important uses, but it is inappropriate here. To see why, suppose we write

x = g[:cat] << "bo"
  #=> ["bo"] 
y = g[:dog] << "diva"
  #=> ["bo", "diva"] 
x #=> ["bo", "diva"]

This is because the values of :cat and :dog are both set equal to the same object, an empty array. We can see this by examining object_ids:

x.object_id
  #=> 70339916855860 
y.object_id
  #=> 70339916855860 

Giving Hash::new a block which returns the default value

The second form of default value is to perform a block calculation. If we define the hash with a block:

h = Hash.new { |h,k| h[key] = [] }

then if h does not have a key k, h[k] will be set equal to the value returned by the block, in this case an empty array. Note that the block variable h is the newly-created empty hash. This allows us to write

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

As the first element passed to the block is arr.first, the block variables are assigned values by evaluating

k, v = arr.first
  #=> [:dog, "fido"] 
k #=> :dog 
v #=> "fido" 

The block calculation is therefore

h[k] << v
  #=> h[:dog] << "fido"

but since h does not (yet) have a key :dog, the block is triggered, setting h[k] equal to [] and then that empty array is appended with "fido", so that

h #=> { :dog=>["fido"] }

Similarly, after the next two elements of arr are passed to the block we have

h #=> { :dog=>["fido"], :car=>["audi"], :cat=>["lucy"] }

When the next (fourth) element of arr is passed to the block, we evaluate

h[:dog] << "diva"

but now h does have a key, so the default does not apply and we end up with

h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy"]} 

The last element of arr is processed similarly.

Note that, when using Hash::new with a block, we could write something like this:

h = Hash.new { launch_missiles("any time now") }

in which case h[k] would be set equal to the return value of launch_missiles. In other words, anything can be done in the block.

Even more Ruby-like

Lastly, the more Ruby-like way of writing

h = Hash.new { |h,k| h[k] = [] }
arr.each { |k,v| h[k] << v }
h #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

is to use Enumerable#each_with_object:

arr.each_with_object(Hash.new { |h,k| h[k] = [] }) { |k,v| h[k] << v }
  #=> {:dog=>["fido", "diva"], :car=>["audi"], :cat=>["lucy", "bo"]}

which eliminates two lines of code.

Which is best?

Personally, I am indifferent to the second and third solutions. Both are used in practice.

Comments