S.Kang - 9 months ago 73

Scala Question

I'm interested in **apache SPARK.**

I tried to ascending sort a multiple array of SPARK RDD by any column in scala.

(i.e.

`RDD[Array[Int] -> Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))`

If I sort by first column, then result will be

`Array(Array(1,2,3), Array(1,2,1), Array(2,3,4)).`

`Array(Array(1,2,3), Array(1,2,3), Array(2,3,4)).`

)

and then, I want to get RDD[Array[Int]] return-type value.

Is there a method to solve it, whether using

`map()`

`filter()`

Answer Source

Use `RDD.sortBy`

:

```
// sorting by second column (index = 1)
val result: RDD[Array[Int]] = rdd.sortBy(_(1), ascending = true)
```

The sorting function can also be written using Pattern Matching:

```
val result: RDD[Array[Int]] = rdd.sortBy( {
case Array(a, b, c) => b /* choose column(s) to sort by */
}, ascending = true)
```

Also note the `ascending`

argument's default value is `true`

, so you can drop it and get the same result:

```
val result: RDD[Array[Int]] = rdd.sortBy(_(1))
```