nimafl - 9 months ago 110

Python Question

I am completely new to Apache Spark and I trying to Cartesian product two RDD. As an example I have A and B like :

`A = {(a1,v1),(a2,v2),...}`

B = {(b1,s1),(b2,s2),...}

I need a new RDD like:

`C = {((a1,v1),(b1,s1)), ((a1,v1),(b2,s2)), ...}`

Any idea how I can do this? As simple as possible :)

Thanks in advance

PS: I finally did it like this as suggested by @Amit Kumar:

cartesianProduct = A.cartesian(B)

Answer

That's not the dot product, that's the cartesian product. Use the `cartesian`

method:

```
def cartesian[U](other: spark.api.java.JavaRDDLike[U, _]): JavaPairRDD[T, U]
```

Return the Cartesian product of this RDD and another one, that is, the RDD of all pairs of elements (a, b) where a is in

`this`

and b is in`other`

.