I have been looking and I cannot find an example or discussion of the
val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))
Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))
The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want
groupBy. Close to at least. As you start with a
Seq[(String, String)], and you group by taking the first item in the tuple (which is
(String, String) => String), it would return a
Map[String, Seq[(String, String)]). You then have to discard the first parameter in the Seq[String, String)] values.
There you get a
Map[String, Seq[(String, String)]. If you want a
Seq instead of
toSeq on the result. I don't think you have a guarantee on the order in the resulting Seq though
Aggregate is a more difficult function.
Consider first reduceLeft and reduceRight.
Let as be a non empty sequence
as = Seq(a1, ... an) of elements of type A, and
f: (A,A) => A be some way to combine two elements of type
A into one. I will note it as a binary operator
a1 @ a2 rather than
as.reduceLeft(@) will compute
(((a1 @ a2) @ a3)... @ an).
reduceRight will put the parentheses the other way,
(a1 @ (a2 @... @ an)))). If
@ happens to be associative, one does not care about the parentheses. One could compute it as
(a1 @... @ ap) @ (ap+1 @[email protected]) (there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when
@ is known to be associative, and the reduceLeft method cannot know that.
Still, there could be method
reduce, whose caller would be responsible for ensuring that the operation is associative. Then
reduce would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.
There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type:
@ has to be
(A,A) => A. But one could have the more general problem of combining them into a
B. One starts with a value of type
b of type
B, and combine it with every elements of the sequence. The operator
(B,A) => B, and one computes
(((b @ a1) @ a2) ... @ an).
foldLeft does that.
foldRight does the same thing but starting with
an. There, the
@ operation has no chance to be associative. When one writes
b @ a1 @ a2, it must mean
(b @ a1) @ a2, as
(a1 @ a2) would be ill-typed. So foldLeft and foldRight have to be sequential.
Suppose however, that each
A can be turned into a
B, let's write it with
a! is of type
B. Suppose moreover that there is a
(B,B) => B, and that
@ is such that
b @ a is in fact
b + a!. Rather than combining elements with @, one could first transform all of them to B with
!, then combine them with
+. That would be
as.map(!).reduceLeft(+). And if
+ is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).
Aggregate is very close to that. It may be however, that there is a more efficient way to implement
[email protected] than
b+a! For instance, if type
List[A], and [email protected] is a::b, then
a! will be
b1 + b2 will be
b2 ::: b1. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use
@, one first splits
b + a1! + ... + an!, into
(b + a1! + ap!) + (ap+1! + ..+ an!), then go back to using
(b @ a1 @ an) + (ap+1! @ @ an). One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that,
as.associativeFold(!, +) could be changed to
as.optimizedAssociativeFold(b, !, @, +).
+ is associative, or equivalently,
(B, +) is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element
z (for zero) in B, so that for each
z + b =
b + z =
b. In that case, the
! operation that make sense is likely to be be
a! = z @ a. Moreover, as z is a neutral element
b @ a1 [email protected] an = (b + z) @ a1 @ an which is
b + (z + a1 @ an). So is is always possible to start the aggregation with z. If
b is wanted instead, you do
b + result at the end. With all those hypotheses, we can do a
s.aggregate(z, @, +). That is what
@ is the
seqop argument (applied in a sequence
z @ a1 @ a2 @ ap), and
combop (applied to already partially combined results, as in
(z + [email protected]@ap) + (z + [email protected]@an)).
To sum it up,
as.aggregate(z)(seqop, combop) computes the same thing as
as.foldLeft(z)( seqop) provided that
(B, combop, z)is a monoid
seqop(b,a) = combop(b, seqop(z,a))
aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.
Finally, solving the initial problem using
aggregate is left as an exercise to the reader. A hint: implement using
foldLeft, then find
combo that will satisfy the conditions stated above.