Vasco Vasco - 1 month ago 16
Scala Question

scala parallel collections not consistent

I am getting inconsistent answers from the following code which I find odd.

import scala.math.pow

val p = 2
val a = Array(1,2,3)

println(a.par
.aggregate("0")((x, y) => s"$y pow $p; ", (x, y) => x + y))

for (i <- 1 to 100) {
println(a.par
.aggregate(0.0)((x, y) => pow(y, p), (x, y) => x + y) == 14)
}

a.map(x => pow(x,p)).sum


In the code the
a.par ...
computes
14
or
10
. Can anyone provide an explanation for why it is computing inconsistently?

Answer

In your "seqop" function, that is the first function you pass to aggregate, you define the logic that is used to combine elements within the same partition. Your function looks like this:

(x, y) => pow(y, p)

The problem is that you don't accumulate the results of a partition. Instead, you throw away your accumulator x. Every time you get 10 as a result, the calculation 2^2 was dropped.

If you change your function to take the accumulated value into account, you will get 14 every time:

(x, y) => x + pow(y, p)