NikhilaR NikhilaR - 21 days ago 9
Scala Question

loop inside spark RDD filter

I am new to Spark and am trying to code in scala. I have an RDD which consists of data in the form :

1: 2 3 5
2: 5 6 7
3: 1 8 9
4: 1 2 4


and another list in the form [1,4,8,9]

I need to filter the RDD such that it takes those lines in which either the value before ':' is present in the list or if any of the values after ':' are present in the list.

I have written the following code:

val links = linksFile.filter(t => {
val l = t.split(": ")
root.contains(l(0).toInt) ||
for(x<-l(0).split(" ")){
root.contains(x.toInt)
}
})


linksFile is the RDD and root is the list.

But this doesn't work. any suggestions??

Answer Source

You're close: the for-loop just doesn't actually use the value computed inside it. You should use the exists method instead. Also I think you want l(1), not l(0) for the second check:

val links = linksFile.filter(t => {
                        val l = t.split(": ")
                        root.contains(l(0).toInt) ||
                        l(1).split(" ").exists { x =>
                            root.contains(x.toInt)
                        }
                    })