hli hli - 14 days ago 5
Scala Question

Find number of similar elements in an RDD of (Array[Int] , Array[Int])

I have an RDD of a tuple of Array[Int] and would like to know how many elements are similar in the Arrays, what is the best way to do this?

Answer

Number of common element in array is a size of the set intersection:

rdd.map { case (x, y) =>  x.toSet.intersect(y.toSet).size }