S.Kang S.Kang - 1 month ago 20
Scala Question

how to ascending sort a multiple array of SPARK RDD by any column in scala?

I'm interested in apache SPARK.

I tried to ascending sort a multiple array of SPARK RDD by any column in scala.

(i.e.

RDD[Array[Int] -> Array(Array(1,2,3), Array(2,3,4), Array(1,2,1))


If I sort by first column, then result will be
Array(Array(1,2,3), Array(1,2,1), Array(2,3,4)).
or If I sort by third column, then result will be
Array(Array(1,2,3), Array(1,2,3), Array(2,3,4)).

)
and then, I want to get RDD[Array[Int]] return-type value.
Is there a method to solve it, whether using
map()
or
filter()
function?

Answer

Use RDD.sortBy:

// sorting by second column (index = 1)
val result: RDD[Array[Int]] = rdd.sortBy(_(1), ascending = true)

The sorting function can also be written using Pattern Matching:

val result: RDD[Array[Int]] = rdd.sortBy( {
  case Array(a, b, c) => b /* choose column(s) to sort by */
}, ascending = true)

Also note the ascending argument's default value is true, so you can drop it and get the same result:

val result: RDD[Array[Int]] = rdd.sortBy(_(1))