Stefan Endrullis Stefan Endrullis - 4 months ago 74
Scala Question

Simplest way to get the top n elements of a Scala Iterable

Is there a simple and efficient solution to determine the top n elements of a Scala Iterable? I mean something like

iter.toList.sortBy(_.myAttr).take(2)


but without having to sort all elements when only the top 2 are of interest. Ideally I'm looking for something like

iter.top(2, _.myAttr)


see also: Solution for the top element using an Ordering: In Scala, how to use Ordering[T] with List.min or List.max and keep code readable

Update:



Thank you all for your solutions. Finally, I took the original solution of user unknown and adopted it to use
Iterable
and the pimp-my-library pattern:

implicit def iterExt[A](iter: Iterable[A]) = new {
def top[B](n: Int, f: A => B)(implicit ord: Ordering[B]): List[A] = {
def updateSofar (sofar: List [A], el: A): List [A] = {
//println (el + " - " + sofar)

if (ord.compare(f(el), f(sofar.head)) > 0)
(el :: sofar.tail).sortBy (f)
else sofar
}

val (sofar, rest) = iter.splitAt(n)
(sofar.toList.sortBy (f) /: rest) (updateSofar (_, _)).reverse
}
}

case class A(s: String, i: Int)
val li = List (4, 3, 6, 7, 1, 2, 9, 5).map(i => A(i.toString(), i))
println(li.top(3, _.i))

Answer

My solution (bound to Int, but should be easily changed to Ordered (a few minutes please):

def top (n: Int, li: List [Int]) : List[Int] = {

  def updateSofar (sofar: List [Int], el: Int) : List [Int] = {
    // println (el + " - " + sofar)
    if (el < sofar.head) 
      (el :: sofar.tail).sortWith (_ > _) 
    else sofar
  }

  /* better readable:
    val sofar = li.take (n).sortWith (_ > _)
    val rest = li.drop (n)
    (sofar /: rest) (updateSofar (_, _)) */    
  (li.take (n). sortWith (_ > _) /: li.drop (n)) (updateSofar (_, _)) 
}

usage:

val li = List (4, 3, 6, 7, 1, 2, 9, 5)    
top (2, li)
  • For above list, take the first 2 (4, 3) as starting TopTen (TopTwo).
  • Sort them, such that the first element is the bigger one (if any).
  • repeatedly iterate through the rest of the list (li.drop(n)), and compare the current element with the maximum of the list of minimums; replace, if neccessary, and resort again.
  • Improvements:
    • Throw away Int, and use ordered.
    • Throw away (_ > _) and use a user-Ordering to allow BottomTen. (Harder: pick the middle 10 :) )
    • Throw away List, and use Iterable instead

update (abstraction):

def extremeN [T](n: Int, li: List [T])
  (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
     List[T] = {

  def updateSofar (sofar: List [T], el: T) : List [T] =
    if (comp1 (el, sofar.head)) 
      (el :: sofar.tail).sortWith (comp2 (_, _)) 
    else sofar

  (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) 
}

/*  still bound to Int:  
def top (n: Int, li: List [Int]) : List[Int] = {
  extremeN (n, li) ((_ < _), (_ > _))
}
def bottom (n: Int, li: List [Int]) : List[Int] = {
  extremeN (n, li) ((_ > _), (_ < _))
}
*/

def top [T] (n: Int, li: List [T]) 
  (implicit ord: Ordering[T]): Iterable[T] = {
  extremeN (n, li) (ord.lt (_, _), ord.gt (_, _))
}
def bottom [T] (n: Int, li: List [T])
  (implicit ord: Ordering[T]): Iterable[T] = {
  extremeN (n, li) (ord.gt (_, _), ord.lt (_, _))
}

top (3, li)
bottom (3, li)
val sl = List ("Haus", "Garten", "Boot", "Sumpf", "X", "y", "xkcd", "x11")
bottom (2, sl)

To replace List with Iterable seems to be a bit harder.

As Daniel C. Sobral pointed out in the comments, a high n in topN can lead to much sorting work, so that it could be useful, to do a manual insertion sort instead of repeatedly sorting the whole list of top-n elements:

def extremeN [T](n: Int, li: List [T])
  (comp1: ((T, T) => Boolean), comp2: ((T, T) => Boolean)):
     List[T] = {

  def sortedIns (el: T, list: List[T]): List[T] = 
    if (list.isEmpty) List (el) else 
    if (comp2 (el, list.head)) el :: list else 
      list.head :: sortedIns (el, list.tail)

  def updateSofar (sofar: List [T], el: T) : List [T] =
    if (comp1 (el, sofar.head)) 
      sortedIns (el, sofar.tail)
    else sofar

  (li.take (n) .sortWith (comp2 (_, _)) /: li.drop (n)) (updateSofar (_, _)) 
}

top/bottom method and usage as above. For small groups of top/bottom Elements, the sorting is rarely called, a few times in the beginning, and then less and less often over time. For example, 70 times with top (10) of 10 000, and 90 times with top (10) of 100 000.