Sotos - 9 months ago 44

Scala Question

I am trying to apply a function to cartesian RDDs. The function is taken from here and I have no idea how to make it work on cartesian RDDs.

`val combined = rdd_valid.cartesian(rdd1)`

combined.collect().foreach(a => println(a))

(abcde,abdce)

(somethin,somthing)

(afghr, decsvt)

My first thought was to do

`val newRDD = combined.map(Levenshtein.distance)`

But it doesn't work.

Answer

Assuming `combined`

has the type `RDD[(String, String)]`

, and `Levenshtein.distance`

has this signature:

```
def distance(s1:String, s2:String)
```

You can apply it as follows:

```
val newRDD = combined.map { case (s1, s2) => Levenshtein.distance(s1, s2) }
```

Or, alternatively:

```
val newRDD = combined.map(t => Levenshtein.distance(t._1, t._2))
```

Source (Stackoverflow)