mitchus mitchus - 3 months ago 47
Scala Question

Access key from mapValues or flatMapValues?

In Spark 1.3, is there a way to access the key from

mapValues
?

Specifically, if I have

val y = x.groupBy(someKey)
val z = y.mapValues(someFun)


can
someFun
know which key of y it is currently operating on?

Or do I have to do

val y = x.map(r => (someKey(r), r)).groupBy(_._1)
val z = y.mapValues{ case (k, r) => someFun(r, k) }


Note: the reason I want to use
mapValues
rather than
map
is to preserve the partitioning.

Answer

In this case you can use mapPartitions with the preservesPartitioning attribute.

x.mapPartitions((it => it.map { case (k,rr) => (k, someFun(rr, k)) }), preservesPartitioning = true)

You just have to make sure you are not changing the partitioning, i.e. don't change the key.