Armand Grillet - 2 months ago 4x
Scala Question

# Flattening the key of a RDD

I have a Spark RDD of type

`(Array[breeze.linalg.DenseVector[Double]], breeze.linalg.DenseVector[Double])`
. I wish to flatten its key to transform it into a RDD of type
`breeze.linalg.DenseVector[Double], breeze.linalg.DenseVector[Double])`
. I am currently doing:

``````val newRDD = oldRDD.flatMap(ob => anonymousOrdering(ob))
``````

The signature of anonymousOrdering() is
`String => (Array[DenseVector[Double]], DenseVector[Double])`
.

It returns
`type mismatch: required: TraversableOnce[?]`
. The Python code doing the same thing is:

``````newRDD = oldRDD.flatMap(lambda point: [(tile, point) for tile in anonymousOrdering(point)])
``````

How to do the same thing in Scala ? I generally use
`flatMapValues`
but here I need to flatten the key.

If I understand your question correctly, you can do:

``````val newRDD = oldRDD.flatMap(ob => anonymousOrdering(ob))
// newRDD is RDD[(Array[DenseVector], DenseVector)]
``````

In that case, you can "flatten" the `Array` portion of the tuple using pattern matching and a `for`/`yield` statement:

``````newRDD = newRDD.flatMap{case (a: Array[DenseVector[Double]], b: DenseVector[Double]) => for (v <- a) yield (v, b)}
// newRDD is RDD[(DenseVector, DenseVector)]
``````

Although it's still not clear to me where/how you want to use `groupByKey`

Source (Stackoverflow)