christangrant - 7 months ago 36

Scala Question

I have been looking and I cannot find an example or discussion of the

`aggregate`

Can this function be used to reduce the values of tuples to make a multimap-type collection? For example:

`val list = Seq(("one", "i"), ("two", "2"), ("two", "ii"), ("one", "1"), ("four", "iv"))`

After applying aggregate:

`Seq(("one" -> Seq("i","1")), ("two" -> Seq("2", "ii")), ("four" -> Seq("iv"))`

Also, can you give example of parameters

`z`

`segop`

`combop`

Answer

The aggregate function does not do that (except that it is a very general function, and it could be used to do that). You want `groupBy`

. Close to at least. As you start with a `Seq[(String, String)]`

, and you group by taking the first item in the tuple (which is `(String, String) => String)`

, it would return a `Map[String, Seq[(String, String)]`

). You then have to discard the first parameter in the Seq[String, String)] values.

So

```
list.groupBy(_._1).mapValues(_.map(_._2))
```

There you get a `Map[String, Seq[(String, String)]`

. If you want a `Seq`

instead of `Map`

, call `toSeq`

on the result. I don't think you have a guarantee on the order in the resulting Seq though

Aggregate is a more difficult function.

Consider first reduceLeft and reduceRight.
Let as be a non empty sequence `as = Seq(a1, ... an)`

of elements of type A, and `f: (A,A) => A`

be some way to combine two elements of type `A`

into one. I will note it as a binary operator `@`

, `a1 @ a2`

rather than `f(a1, a2)`

. `as.reduceLeft(@)`

will compute `(((a1 @ a2) @ a3)... @ an)`

. `reduceRight`

will put the parentheses the other way, `(a1 @ (a2 @... @ an))))`

. If `@`

happens to be associative, one does not care about the parentheses. One could compute it as `(a1 @... @ ap) @ (ap+1 @...@an)`

(there would be parantheses inside the 2 big parantheses too, but let's not care about that). Then one could do the two parts in parallel, while the nested bracketing in reduceLeft or reduceRight force a fully sequential computation. But parallel computation is only possible when `@`

is known to be associative, and the reduceLeft method cannot know that.

Still, there could be method `reduce`

, whose caller would be responsible for ensuring that the operation is associative. Then `reduce`

would order the calls as it sees fit, possibly doing them in parallel. Indeed, there is such a method.

There is a limitation with the various reduce methods however. The elements of the Seq can only be combined to a result of the same type: `@`

has to be `(A,A) => A`

. But one could have the more general problem of combining them into a `B`

. One starts with a value of type `b`

of type `B`

, and combine it with every elements of the sequence. The operator `@`

is `(B,A) => B`

, and one computes `(((b @ a1) @ a2) ... @ an)`

. `foldLeft`

does that. `foldRight`

does the same thing but starting with `an`

. There, the `@`

operation has no chance to be associative. When one writes `b @ a1 @ a2`

, it must mean `(b @ a1) @ a2`

, as `(a1 @ a2)`

would be ill-typed. So foldLeft and foldRight have to be sequential.

Suppose however, that each `A`

can be turned into a `B`

, let's write it with `!`

, `a!`

is of type `B`

. Suppose moreover that there is a `+`

operation `(B,B) => B`

, and that `@`

is such that `b @ a`

is in fact `b + a!`

. Rather than combining elements with @, one could first transform all of them to B with `!`

, then combine them with `+`

. That would be `as.map(!).reduceLeft(+)`

. And if `+`

is associative, then that can be done with reduce, and not be sequential: as.map(!).reduce(+). There could be an hypothetical method as.associativeFold(b, !, +).

Aggregate is very close to that. It may be however, that there is a more efficient way to implement `b@a`

than `b+a!`

For instance, if type `B`

is `List[A]`

, and b@a is a::b, then `a!`

will be `a::Nil`

, and `b1 + b2`

will be `b2 ::: b1`

. a::b is way better than (a::Nil):::b. To benefit from associativity, but still use `@`

, one first splits `b + a1! + ... + an!`

, into `(b + a1! + ap!) + (ap+1! + ..+ an!)`

, then go back to using `@`

with `(b @ a1 @ an) + (ap+1! @ @ an)`

. One still needs the ! on ap+1, because one must start with some b. And the + is still necessary too, appearing between the parantheses. To do that, `as.associativeFold(!, +)`

could be changed to `as.optimizedAssociativeFold(b, !, @, +)`

.

Back to `+`

. `+`

is associative, or equivalently, `(B, +)`

is a semigroup. In practice, most of the semigroups used in programming happen to be monoids too, i.e they contain a neutral element `z`

(for *zero*) in B, so that for each `b`

, `z + b`

= `b + z`

= `b`

. In that case, the `!`

operation that make sense is likely to be be `a! = z @ a`

. Moreover, as z is a neutral element `b @ a1 ..@ an = (b + z) @ a1 @ an`

which is `b + (z + a1 @ an)`

. So is is always possible to start the aggregation with z. If `b`

is wanted instead, you do `b + result`

at the end. With all those hypotheses, we can do a`s.aggregate(z, @, +)`

. That is what `aggregate`

does. `@`

is the `seqop`

argument (applied in a *sequence* `z @ a1 @ a2 @ ap`

), and `+`

is `combop`

(applied to already partially *combined* results, as in `(z + a1@...@ap) + (z + ap+1@...@an)`

).

To sum it up, `as.aggregate(z)(seqop, combop)`

computes the same thing as `as.foldLeft(z)( seqop)`

provided that

`(B, combop, z)`

is a monoid`seqop(b,a) = combop(b, seqop(z,a))`

aggregate implementation may use the associativity of combop to group the computations as it likes (not swapping elements however, + has not to be commutative, ::: is not). It may run them in parallel.

Finally, solving the initial problem using `aggregate`

is left as an exercise to the reader. A hint: implement using `foldLeft`

, then find `z`

and `combo`

that will satisfy the conditions stated above.