hli hli - 1 month ago 9
Scala Question

Find number of similar elements in an RDD of (Array[Int] , Array[Int])

I have an RDD of a tuple of Array[Int] and would like to know how many elements are similar in the Arrays, what is the best way to do this?

Answer

Number of common element in array is a size of the set intersection:

rdd.map { case (x, y) =>  x.toSet.intersect(y.toSet).size }