Mnemosyne Mnemosyne - 1 month ago 15
Scala Question

Can't Zip RDDs with unequal number of partitions. What can I use as an alternative to zip?

I have three RDDs of the same size

rdd1
contains a String identifier,
rdd2
contains a vector and
rdd3
contains an integer value.

Essentially I want to zip those three together to get an RDD of
RDD[String,Vector,Int]
but I continuously get can't zip RDDs with unequal number of partitions. How can I completely bypass zip to do the abovementioned thing?

Answer

Try:

rdd1.zipWithIndex.map(x =>x.swap).join(rdd2.zipWithIndex.map(x => x.swap)).values