user1735921 user1735921 - 4 months ago 7
Ruby Question

Most efficient way of comparing arrays with hashes inside them based on a unique hash key in Ruby

I got two hashes hash_a and hash_b which are actually arrays but have hash inside them. Those hash have unique key.

hash_a = [
{:unique_key => 1, :data => 'data for A1'},
{:unique_key => 2, :data => 'data for A2'},
{:unique_key => 3, :data => 'data for A3'}
]

hash_b = [
{:unique_key => 1, :data => 'data for B1'},
{:unique_key => 2, :data => 'data for B2'},
{:unique_key => 4, :data => 'data for B4'},
{:unique_key => 5, :data => 'data for B5'}
]


Now I want to find out difference between hash_a and hash_b, such that I get the hash_c as array of new hashes present in hash_b.
I basically want
hash_b - hash_a


So I want this output for hash_c, hash_c should be this:

[
{:unique_key => 1, :data => 'data for A1'},
{:unique_key => 2, :data => 'data for A2'},
{:unique_key => 3, :data => 'data for A3'},
{:unique_key => 4, :data => 'data for B4'},
{:unique_key => 5, :data => 'data for B5'}
]


I have tried something like this:

hash_c = hash_a
hash_b.each do |inner_bhash|
found = 0

hash_a.each do |inner_ahash|
if(inner_ahash[:unique_key] == inner_bhash[:unique_key])
found = 1
break
end
end

if(found==0)
hash_c.push(inner_bhash)
end
end


This is doing the trick, but I want a better way. Like hashmap or something, I don't know what.




In addition, I may want to see only the new entries, i.e.

[
{:unique_key => 4, :data => 'data for B4'},
{:unique_key => 5, :data => 'data for B5'}
]


I can do that in my code by replacing

hash_c = hash_a


with

hash_c = []


but how could I adapt this requirement in the same way?

Answer

With Hashes you can use merge to do what you want - so going through making each Array into a Hash you can do the following:

hash_b.group_by { |e| e[:unique_key] }.
   merge(hash_a.group_by { |e| e[:unique_key] }).values.flatten
# => [{:unique_key=>1, :data=>"data for A1"}, 
#     {:unique_key=>2, :data=>"data for A2"}, 
#     {:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}, 
#     {:unique_key=>3, :data=>"data for A3"}]

If you want to have just the entries of hash_b (which do not have a key in hash_a), given you already have the solution above - you can simply subtract hash_a from the result:

hash_b.group_by { |e| e[:unique_key] }.
  merge(hash_a.group_by { |e| e[:unique_key] }).values.flatten - hash_a
# => [{:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}]

Another, more straight forward way is to filter out all elements of hash_b who have an entry in hash_a:

hash_b.select { |x| hash_a.none? { |y| x[:unique_key] == y[:unique_key] } }
# => [{:unique_key=>4, :data=>"data for B4"}, 
#     {:unique_key=>5, :data=>"data for B5"}]
Comments