Ethan Xu Ethan Xu - 1 month ago 9
Scala Question

Cartesian product of values for each key

Given a paired RDD, how do I generate another RDD with the same key set, and Cartesian product of values (for each key) as new values?

Here is what I mean:

//Given
(K1, V1)
(K1, V2)
(K2, W1)
(K2, W2)

//Want
(K1, (V1, V1))
(K1, (V1, V2))
(K1, (V2, V2))
(K2, (W1, W1))
(K2, (W1, W2))
(K2, (W2, W2))
//Note (V2, V1) and (W2, W1) are not required, but having them in the result is not a big deal either.


Being new to Scala and Spark, I don't see an easy solution by using build-in transformations such as
mapValues
. Am I missing some magic functions? Thanks a lot.

Answer

Just join thing with itself:

rdd.join(rdd)