Freewind Freewind - 5 months ago 164
Scala Question

Best way to merge two maps and sum the values of same key?

val map1 = Map(1 -> 9 , 2 -> 20)
val map2 = Map(1 -> 100, 3 -> 300)

I want to merge them, and sum the values of same keys. So the result will be:

Map(2->20, 1->109, 3->300)

Now I have 2 solutions:

val list = map1.toList ++ map2.toList
val merged = list.groupBy ( _._1) .map { case (k,v) => k -> }


val merged = (map1 /: map2) { case (map, (k,v)) =>
map + ( k -> (v + map.getOrElse(k, 0)) )

But I want to know if there are any better solutions.


Scalaz has the concept of a Semigroup which captures what you want to do here, and leads to arguably the shortest/cleanest solution:

scala> import scalaz._
import scalaz._

scala> import Scalaz._
import Scalaz._

scala> val map1 = Map(1 -> 9 , 2 -> 20)
map1: scala.collection.immutable.Map[Int,Int] = Map(1 -> 9, 2 -> 20)

scala> val map2 = Map(1 -> 100, 3 -> 300)
map2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 100, 3 -> 300)

scala> map1 |+| map2
res2: scala.collection.immutable.Map[Int,Int] = Map(1 -> 109, 3 -> 300, 2 -> 20)

Specifically, the binary operator for Map[K, V] combines the keys of the maps, folding V's semigroup operator over any duplicate values. The standard semigroup for Int uses the addition operator, so you get the sum of values for each duplicate key.

Edit: A little more detail, as per user482745's request.

Mathematically a semigroup is just a set of values, together with an operator that takes two values from that set, and produces another value from that set. So integers under addition are a semigroup, for example - the + operator combines two ints to make another int.

You can also define a semigroup over the set of "all maps with a given key type and value type", so long as you can come up with some operation that combines two maps to produce a new one which is somehow the combination of the two inputs.

If there are no keys that appear in both maps, this is trivial. If the same key exists in both maps, then we need to combine the two values that the key maps to. Hmm, haven't we just described an operator which combines two entities of the same type? This is why in Scalaz a semigroup for Map[K, V] exists if and only if a Semigroup for V exists - V's semigroup is used to combine the values from two maps which are assigned to the same key.

So because Int is the value type here, the "collision" on the 1 key is resolved by integer addition of the two mapped values (as that's what Int's semigroup operator does), hence 100 + 9. If the values had been Strings, a collision would have resulted in string concatenation of the two mapped values (again, because that's what the semigroup operator for String does).

(And interestingly, because string concatenation is not commutative - that is, "a" + "b" != "b" + "a" - the resulting semigroup operation isn't either. So map1 |+| map2 is different from map2 |+| map1 in the String case, but not in the Int case.)