finman - 11 months ago 86

Scala Question

When reduceByKey is called it sums all values with same key. Is there any way to calculate the average of values for each key ?

`// I calculate the sum like this and don't know how to calculate the avg`

reduceByKey((x,y)=>(x+y)).collect

Array(((Type1,1),4.0), ((Type1,1),9.2), ((Type1,2),8), ((Type1,2),4.5), ((Type1,3),3.5),

((Type1,3),5.0), ((Type2,1),4.6), ((Type2,1),4), ((Type2,1),10), ((Type2,1),4.3))

Answer Source

One way is to use map values and reduceByKey which is easier than aggregateByKey.

```
.mapValues(x => (x, 1)).reduceByKey((x, y) => (x._1 + y._1, x._2 + y._2)).mapValues(y => 1.0 * y._1 / y._2).collect
```

https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html