BDR BDR - 2 months ago 23
Scala Question

Get max record from RDD

I want to get a maximum value from an RDD using RDD.max in Scala. My RDD contains bunch of VOs with field duration (Long type). I tried with following code but It works well only with Int and not with Long (as per Document)

val vo1 = new MyVO()
vo1.setDuration(1234L)

val vo2 = new MyVO()
vo2.setDuration(123L)

val a = Array(vo1, vo2)
val sc = prepareConfig()
val rdd = sc.parallelize(a)

val maxKey2 = rdd.max()(new Ordering[MyVO]() {
override def compare(x: MyVO,
y: MyVO): Long =
Ordering[Long].compare(x.duration, y.duration)
})

println(maxKey2.duration)


I'm referring to this post
How to find max value in pair RDD? .
But I don't know how to deal with Long in my case. Any help highly appreciated

Answer

The result of compare in that context is always an Int (no matter which types you compare, have a look at the definition of compare in trait Ordering).

As you are comparing Long values, the compare function can be simplified to:

override def compare(x: TransactionSummeryVO, y: TransactionSummeryVO): Int = 
    x.duration.compareTo(y.duration)