tkachuko tkachuko - 1 year ago 87
Scala Question

Behavior of scala fold in parallel collections

Let's run the following line of code several times:

Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)

The results are quite interesting:

scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res10: Int = 8
scala> Set(1,2,3,4,5,6,7).par.fold(0)(_ - _)
res11: Int = 20

However clearly it should be like in its sequential version:

scala> Set(1,2,3,4,5,6,7).fold(0)(_ - _)
res15: Int = -28

I understand that operation
is non-associative on integers and that's the reason behind such behavior, but my question is quite simple: doesn't it mean that
should not be parallelized in
implementation of collections?

Answer Source

When you look at the standard library documentation, you see that fold is undeterministic here:

Folds the elements of this sequence using the specified associative binary operator. The order in which operations are performed on elements is unspecified and may be nondeterministic.

As an alternative, there's foldLeft:

Applies a binary operator to a start value and all elements of this sequence, going left to right. Applies a binary operator to a start value and all elements of this collection or iterator, going left to right.

Note: might return different results for different runs, unless the underlying collection type is ordered or the operator is associative and commutative.

As Set is not an ordered collection, there's no canonical order in which the elements could be folded, so the standard library allows itself to be undeterministic even for foldLeft. If you would use an ordered sequence here, foldLeft would be deterministic in that case.