raHul - 11 months ago 93

Scala Question

I have an Rdd like below

`val m = sc.parallelize(Seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2))))`

I transformed the above Rdd by using the groupByKey like below

`val b = m.groupByKey.mapValues( _.toList)`

Result:

`(a,List((x,1), (y,2), (z,2)))`

(b,List((x,1), (y,2)))

Now, I want to filter the tuples with max values in each list

So the expected result would be

`(a,List((y,2), (z,2)))`

(b,List((y,2)))

Answer Source

Considering a sequence given is:
`val m = Seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2)))`

```
val r1 =
m.groupBy(_._1)
.map { case (k, v) => k -> v.map(_._2) }
.map { case (k, v) =>
k -> {
val sorted = v.sortWith { case (x, y) => x._2 > y._2 }
val max = sorted.head._2
sorted.takeWhile(_._2 == max)
}
}
.toList
```

Which gives the result as:
`r1: List[(String, Seq[(String, Int)])] = List((b,List((y,2))), (a,List((y,2), (z,2))))`