Miguel Gamboa Miguel Gamboa - 1 month ago 5
Java Question

Where is defined the combination order of the combiner of collect(supplier, accumulator, combiner)?

The Java API documentations states that the

combiner
parameter of the
collect
method must be:


an associative, non-interfering, stateless function for combining two values, which must be compatible with the accumulator function


A
combiner
is a
BiConsumer<R,R>
that receives two parameters of type
R
and returns
void
. But the documentation does not state if we should combine the elements into the first or the second parameter?

For instance the following examples may give different results, depending on the order of combination be:
m1.addAll(m2)
or
m2.addAll(m1)
.

List<String> res = LongStream
.rangeClosed(1, 1_000_000)
.parallel()
.mapToObj(n -> "" + n)
.collect(ArrayList::new, ArrayList::add,(m1, m2) -> m1.addAll(m2));


I know that in this case we could simply use a method handle, such as
ArrayList::addAll
. Yet, there are some cases where it is required a Lambda and we must combine the items in the correct order, otherwise we could get an inconsistent result when processing in parallel.

Is this claimed in any part of the Java 8 API documentation? Or it really doesn't matter?

Answer

Seems that this is not explicitly stated in the documentation. However there's an ordering concept in streams API. Stream can be either ordered or not. It may be unordered from the very beginning if source spliterator is unordered (for example, if the stream source is HashSet). Or the stream may become unordered if user explicitly used unordered() operation. If the stream is ordered, then collection procedure should also be stable, thus, I guess, it's assumed that for ordered streams the combiner receives the arguments in the strict order. However it's not guaranteed for an unordered stream.