mitchus mitchus - 1 year ago 123
Scala Question

Access key from mapValues or flatMapValues?

In Spark 1.3, is there a way to access the key from


Specifically, if I have

val y = x.groupBy(someKey)
val z = y.mapValues(someFun)

know which key of y it is currently operating on?

Or do I have to do

val y = => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }

Note: the reason I want to use
rather than
is to preserve the partitioning.

Answer Source

In this case you can use mapPartitions with the preservesPartitioning attribute.

x.mapPartitions((it => { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)

You just have to make sure you are not changing the partitioning, i.e. don't change the key.