I found that spark RDD.fold and scala List.fold behave differently with same input.
List(1, 2, 3, 4).fold(1)(_ + _) // res0: Int = 11
sc.parallelize(List(1, 2, 3, 4)).fold(1)(_ + _) // res0: Int = 15
It is not buggy, you're just not using in the right way.
zeroElement should be neutral, it means that it has to satisfy following condition:
op(x, zeroValue) === op(zeroValue, x) === x
+ then the right choice is 0.
Why restriction like this? If
fold is to be executed in parallel each chunk will have to initialize its own
zeroValue. In a more formal way you can think about
opis equivalent to •
zeroElementis equivalent to identity element.