finman finman - 1 year ago 164
Scala Question

Spark : Average of values instead of sum in reduceByKey using Scala

When reduceByKey is called it sums all values with same key. Is there any way to calculate the average of values for each key ?

// I calculate the sum like this and don't know how to calculate the avg

Array(((Type1,1),4.0), ((Type1,1),9.2), ((Type1,2),8), ((Type1,2),4.5), ((Type1,3),3.5),
((Type1,3),5.0), ((Type2,1),4.6), ((Type2,1),4), ((Type2,1),10), ((Type2,1),4.3))

Answer Source

One way is to use map values and reduceByKey which is easier than aggregateByKey.

.mapValues(x => (x, 1)).reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2)).mapValues(y => 1.0 * y._1 / y._2).collect

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download