Some Name Some Name - 6 months ago 56
Scala Question

Why is iterating through flattened iterator slow?

Scala 2.11.8


I'm measuring iteration through flattened and non-flattened iterator. I wrote the following benchmark:

@State(Scope.Benchmark)
class SerializeBenchmark
var list = List(
List("test", 12, 34, 56),
List("test-test-test", 123, 444, 0),
List("test-test-test-tes", 145, 443, 4333),
List("testdsfg-test-test-tes", 3145, 435, 333),
List("test-tessdfgsdt-tessdfgt-tes", 1455, 43, 333),
List("tesewrt-test-tessdgdsft-tes", 13345, 4533, 3222333),
List("ewrtes6yhgfrtyt-test-test-tes", 122245, 433444, 322233),
List("tserfest-test-testtryfgd-tes", 143345, 43, 3122233),
List("test-reteytest-test-tes", 1121145, 4343, 3331212),
List("test-test-ertyeu6test-tes", 14115, 4343, 33433),
List("test-lknlkkn;lkntest-ertyeu6test-tes", 98141115, 4343, 33433),
List("tkknknest-test-ertyeu6test-tes", 914111215, 488343, 33433),
List("test-test-ertyeu6test-tes", 1411125, 437743, 93433),
List("test-test-ertyeu6testo;kn;lkn;lk-tes", 14111215, 5409343, 39823),
List("telnlkkn;lnih98st-test-ertyeu6test-tes", 1557215, 498343, 3377433)
)

@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Array(Mode.AverageTime))
def flattenerd(bh: Blackhole): Any = {
list.iterator.flatten.foreach(bh.consume)
}

@Benchmark
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@BenchmarkMode(Array(Mode.AverageTime))
def raw(bh: Blackhole): Any = {
list.iterator.foreach(_.foreach(bh.consume))
}
}


After running these benchmarks several times I got the following results:

Benchmark Mode Cnt Score Error Units
SerializeBenchmark.flattenerd avgt 5 10311,373 ± 1189,448 ns/op
SerializeBenchmark.raw avgt 5 3463,902 ± 141,145 ns/op


Almost 3 times difference in performance. And the larger I make the source
list
the bigger performance difference. Why?

I expected some performance difference but not 3 times.

Answer Source

I re-ran your test with a bit more iterations running under the hs_gc profile.

These are the results:

[info] Benchmark                                                       Mode  Cnt        Score          Error  Units
[info] IteratorFlatten.flattenerd                                      avgt   50        0.708 â–’        0.120  us/op
[info] IteratorFlatten.flattenerd:â•–sun.gc.collector.0.invocations    avgt   50        8.840 â–’        2.259      ?
[info] IteratorFlatten.raw                                             avgt   50        0.367 â–’        0.014  us/op
[info] IteratorFlatten.raw:â•–sun.gc.collector.0.invocations           avgt   50        0                     ?

IteratorFlatten.flattenerd had an average of 8 GC cycles during the test runs, where raw had 0. This means that because of the noise generated by the allocation by FlattenOps (the wrapper class and it's method, particularly hasNext which allocates an iterator per list), which is what is needed in order to provide the flatten method on Iterator, we suffer in running time.

If I re-run the test and give it a minimum heap size of 2G, the results get closer:

[info] Benchmark                   Mode   Cnt        Score           Error  Units
[info] IteratorFlatten.flattenerd  avgt   50        0.615 â–’         0.041  us/op
[info] IteratorFlatten.raw         avgt   50        0.434 â–’         0.064  us/op

The gist of it is, the more you allocate, the more work the GC has to do, more pauses, slower execution.

Note that these kind of micro benchmarks are very fragile and may yield different results. Make sure you measure enough allocations for the stats to become significant.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download