Darshan Darshan - 6 months ago 35
Scala Question

Count operation in reduceByKey in spark

val temp1 = tempTransform.map({ temp => ((temp.getShort(0), temp.getString(1)), (USAGE_TEMP.getDouble(2), USAGE_TEMP.getDouble(3)))})
.reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))

Here I have performed Sum operation But Is it possible to do count operation inside reduceByKey.

Like what i think,

reduceByKey((x, y) => (math.count(x._1),(x._2+y._2)))

But this is not working any suggestion please.


Well, counting is equivalent to summing 1s, so just map the first item in each value tuple into 1 and sum both parts of the tuple like you did before:

val temp1 = tempTransform.map { temp => 
   ((temp.getShort(0), temp.getString(1)), (1, USAGE_TEMP.getDouble(3)))
.reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))

Result would be an RDD[((Short, String), (Int, Double))] where the first item in the value tuple (the Int) is the number of original records matching that key.

That's actually the classic map-reduce example - word count.