haisi haisi - 29 days ago 8
SQL Question

Grouping by two properties and mapping to different object

I have the following data:

uuid id1 id2 hId hName percent golden
1 J K a fetchflow 38% 34%
2 J K b codelibs1 45% 34%
3 J K c codelibs2 97% 34%
10 K L a fetchflow 16% 10%
11 K L b codelibs1 95% 10%
12 K L c codelibs2 12% 10%
13 K M a fetchflow 64% 14%
14 K M b codelibs1 53% 14%
15 K M c codelibs2 48% 14%


And want to get to this:

Compare To Golden a b c
J K 34% 38% 45% 97%
K L 10% 16% 95% 12%
K M 14% 64% 53% 48%


Note:
Pair(id1, id2) == Pair(id2, id1)
, so they're interchangeable.

I want to store it in the following java datastructure:

class Foo {
int id1;
int id2;
double golden;
/*
[a -> 0.38,
b -> 0.45,
c -> 0.97]
*/
Map<Integer, Double> comparisons;
}


I currently have the follwing code, but I can't map it to the datastructure that I want:

comparisons
.stream()
.collect(
groupingBy(
Function.identity(),
() -> new TreeMap<>(
Comparator.<ComparisonResultSet, Integer>comparing(o -> o.vacancy_id_1).thenComparing(o -> o.vacancy_id_2)
),
collectingAndThen(
reducing((o, o2) -> o), Optional::get
)
));

Answer

One solution, or rather starting point, would be

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
            o -> Arrays.asList(o.vacancy_id_1, o.vacancy_id_2),
            Collectors.toMap(o -> o.hId, o -> Arrays.asList(o.percent, o.golden))),
    m -> m.entrySet().stream().map(e -> new Foo(
            e.getKey().get(0), e.getKey().get(1),
            e.getValue().values().stream().mapToDouble(l->l.get(1))
                    .reduce((a,b)->{assert a==b; return a; }).getAsDouble(),
            e.getValue().entrySet().stream()
                    .collect(Collectors.toMap(Map.Entry::getKey, en->en.getValue().get(0)))
    )).collect(Collectors.toList())
));

which only uses standard Collection classes, which complicates matters. It groups by Arrays.asList(o.vacancy_id_1, o.vacancy_id_2), which implies an ordering of the IDs. You could wrap it with new HashSet<>(…) to get an order-independent key, however, that complicates the solution, when it comes to the construction of the Foo instances, as a dedicated id1 and id2 is required. I.e.

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
            o -> new HashSet<>(Arrays.asList(o.vacancy_id_1, o.vacancy_id_2)),
            Collectors.toMap(o -> o.hId, o -> Arrays.asList(o.percent, o.golden))),
    m -> m.entrySet().stream().map(e -> {
        Iterator<Integer> it = e.getKey().iterator();
        return new Foo(
            it.next(), it.next(),
            e.getValue().values().stream().mapToDouble(l->l.get(1))
                    .reduce((a,b)->{assert a==b; return a; }).getAsDouble(),
            e.getValue().entrySet().stream()
                    .collect(Collectors.toMap(Map.Entry::getKey, en->en.getValue().get(0)))
        );
    }).collect(Collectors.toList())
));

Note that new HashSet<>(Arrays.asList(o.vacancy_id_1, o.vacancy_id_2)) could be replaced by Set.of(o.vacancy_id_1, o.vacancy_id_2) in Java 9.

A dedicated order-independent pair type would make the operation simpler, especially, when you replace the two id properties by a single property of that type in both, source and result type, right from the start.

Another obstacle is the “golden” property. Without it, the downstream collector would be Collectors.toMap(o -> o.hId, o -> o.percent), producing exactly the desired map for the Foo result. Since we have to carry another property here, the map needs a subsequent conversion step, after the “golden” property has been reduce to a single value.

Using a pair class like

public final class UnorderedPair<T> {
    public final T a, b;

    public UnorderedPair(T a, T b) {
        this.a = a;
        this.b = b;
    }
    public int hashCode() {
        return a.hashCode()+b.hashCode()+UnorderedPair.class.hashCode();
    }
    public boolean equals(Object obj) {
        if(this == obj) return true;
        if(!(obj instanceof UnorderedPair)) return false;
        final UnorderedPair<?> other = (UnorderedPair<?>) obj;
        return a.equals(other.a) && b.equals(other.b)
            || a.equals(other.b) && b.equals(other.a);
    }
}

and the pairing collector from this answer, we get

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
        o -> new UnorderedPair<>(o.vacancy_id_1, o.vacancy_id_2),
            pairing(
                Collectors.toMap(o -> o.hId, o -> o.percent),
                Collectors.reducing(null, o -> o.golden,
                    (a,b) -> {assert a==null || a.doubleValue()==b; return b; }),
            (m,golden) -> new AbstractMap.SimpleImmutableEntry<>(m,golden))),
    m -> m.entrySet().stream().map(e -> new Foo(
        e.getKey().a, e.getKey().b, e.getValue().getValue(), e.getValue().getKey()))
    .collect(Collectors.toList())
));

but, as said, having a single property of the unordered pair type in source and result would simplify the task much more.