vkubicki vkubicki - 26 days ago 19
Scala Question

Scala slow hashmap

For Project Euler problem 14, I created this answer:

import scala.collection.immutable.LongMap

object LongestCollatzSequencePerso {
def nextElem(n: Long): Long = n match {
case x if n % 2 == 0 => n / 2
case _ => 3 * n + 1
}

def funcVal(acc: LongMap[Long], n: Long): LongMap[Long] = {
if (acc.contains(n)) {
return acc
}
else {
val nNext = nextElem(n)
val size = funcVal(acc, nNext)(nNext) + 1
return acc + (n -> size)
}
}

def main = {
val max = 1000000L
val allVal = (1L to max).foldLeft(LongMap(1L -> 1L))(funcVal)
println(allVal.filter(_._1 < max).maxBy(_._2)._1)
}
}


I use an immutable
LongMap
that caches every result computed so far, in order to stop recursive call as soon as one has to be returned. My code is very slow, and I can not get the result.

Now this code, taken from the Internet, does not cache anything:

object LongestCollatzSequenceWeb {
def from(n: Long, c: Int = 0): Int = if (n == 1) c + 1 else
from(if (n % 2 == 0) n / 2 else 3 * n + 1, c + 1)

val r = (1 until 1000000).view
.map(n => (n, from(n)))
.reduceLeft((a, b) => if (a._2 > b._2) a else b)
._1

def main = println(r)
}


But it runs fast enough to get the correct answer in a short time.

Why is my cached version so slow ? I understand that caching creates its own overhead, but I was hoping to get a result in a reasonnable time anyway. Do you see a way I could increase the performances while keeping everything immutable ?

I also created this tail recursive version (as suggested in an answer), but it is very slow too:

import scala.annotation.tailrec
import scala.collection.immutable.LongMap

object LongestCollatzSequenceTailRec {
def nextElem(n: Long): Long = n match {
case x if n % 2 == 0 => n / 2
case _ => 3 * n + 1
}

@tailrec
def funcVal(acc: (List[Long], LongMap[Long]), n: Long): (List[Long], LongMap[Long]) = {
val (previous, dic) = acc
if (dic.contains(n)) {
val disN = dic(n)
val dis = disN + 1 to disN + previous.length
return (Nil, dic ++ previous.zip(dis))
}
else {
return funcVal((n :: previous, dic), nextElem(n))
}
}

def main = {
val max = 1000000L
val allVal = (1L to max).foldLeft((List[Long](), LongMap(1L -> 1L)))(funcVal)
println(allVal._2.filter(_._1 < max).maxBy(_._2)._1)
}
}

Answer

Iteration #1: Initial implementation

It turns out that the version you posted is not tail recursive, just add @tailrec annotation to your funcVal method and see that it will not compile because recursive call is not in tail position.

On the contrary method from in LongestCollatzSequenceWeb is tail recursive (also checked by adding @tailrec).

Now we try to compare apples with oranges or recursive method performance with iterative one :)

Iteration #2: After @tailrec modification you should clearly see that you create huge amount of memory. Let's prove that by simple logging of Garbage Collection time:

val scheduler = Executors.newScheduledThreadPool(1)
scheduler.scheduleAtFixedRate(new Runnable {

  override def run(): Unit = {

    val totalTime = ManagementFactory.getGarbageCollectorMXBeans.asScala.map(_.getCollectionTime).sum
    println("Spent time for GC: " + totalTime)
  }
}, 0, 5, TimeUnit.SECONDS)

Let's run your code with the val max = 1000000L. We will not wait for eternity for it to stop but we will see the following:

Spent time for GC: 0
Spent time for GC: 47
Spent time for GC: 67
Spent time for GC: 107
Spent time for GC: 157
Spent time for GC: 201
......................
Spent time for GC: 940
Spent time for GC: 988
Spent time for GC: 1034
......................

So shortly we ended up wasting more than 1 second on GC! Moreover it indicates that garbage collection happened quite frequent.

On the contrary let's try 'code from the internet' for 100000000 limit (100 times more than original one):

Spent time for GC: 55
Spent time for GC: 58
Spent time for GC: 60
Spent time for GC: 64

As you can see we have way less garbage collections (as we allocate less memory) and the pace of growth is slower (+2 - +4 millis for each 5 seconds compared to +40 - +70 in previous sample).

Hope it helps and clearly points to the flaws in your current solution.

Comments