Randomize Randomize - 11 months ago 57
Scala Question

Scala: memory usage for Stream and List across transformations

I have a doubt about how Scala allocates memory across transformations using

. Let's take a simple example like this written using both

Stream(1,2,3,4,5).map(_ + 3).filter (_ % 2 ==0).toList

List(1,2,3,4,5).map(_ + 3).filter (_ % 2 ==0)

Using the
will allocate in memory first a new "structure" (let's say another
) after the
then another
after the filter, for a total of 2
s, plus for each of those lists the contained elements allocated?

In the case of the
will it be only one?

If both are "yes", why Scala doesn't use
behaviour as default behaviour for transformations?

Answer Source

Lazy evaluation makes it hard to reason about when things will be evaluated and Laziness does not work well with Side effects

Memory consumption of lazy lists (in general lazy data structures) is hard to deal with because they result in accumulation of unevaluated thunks in the memory.

Laziness does not work well with side effects. In case of side effects when does the side effect happen in the computation matters. As thunks evaluation is deferred the time of side effecting is hard to reason about. for example it matters when a global variable is read/written. it matters when a db write/read happens and when printing to console happens

Thunk means deferred computation to be evaluated or unevaluated expression.

Lazy streams (or lazy lists) result in thunks (description of computation) which take up memory. These thunks are not evaluated until and unless explicitly asked for. During the course of execution of the programs memory consumption might go really high because of accumulation of unevaluated thunks in memory. This is the classic problem with haskell lists as well. By default haskell is lazy. Thats why there is a strict version of fold and a lazy version of fold in haskell to deal with this kind of problems.

Lets examine what happens with Stream code

Here is the code of filter from standard lib

override def filter(p: A => Boolean): Stream[A] = {
    // optimization: drop leading prefix of elems for which f returns false
    // var rest = this dropWhile (!p(_)) - forget DRY principle - GC can't collect otherwise
    var rest = this
    while (!rest.isEmpty && !p(rest.head)) rest = rest.tail
    // private utility func to avoid `this` on stack (would be needed for the lazy arg)
    if (rest.nonEmpty) Stream.filteredTail(rest, p)
    else Stream.Empty

private[immutable] def filteredTail[A](stream: Stream[A], p: A => Boolean) = {
    cons(stream.head, stream.tail filter p)

Here is the lazily evaluated thunk tl

/** A lazy cons cell, from which streams are built. */
  final class Cons[+A](hd: A, tl: => Stream[A]) extends Stream[A] {
    override def isEmpty = false
    override def head = hd
    @volatile private[this] var tlVal: Stream[A] = _
    @volatile private[this] var tlGen = tl _
    def tailDefined: Boolean = tlGen eq null
    override def tail: Stream[A] = {
      if (!tailDefined)
        synchronized {
          if (!tailDefined) {
            tlVal = tlGen()
            tlGen = null


tl is the chunk which remains in the memory until something triggers its evaluation. Thunk may also produce multiple other thunk which may remain unevaluated taking memory.

Stream(1, 2, 3, 4, 5, 6).filter(_ % 2 == 0)

cons(2, thunk)

cons(2, cons(4, thunk)

cons(2, 4, 6, thunk)