Daniel Daniel - 7 months ago 52
C# Question

Efficiency of Linq GroupJoin vs. Linq All in Select

Does Linq use any sorting or other mechanisms to make a group join more efficient so it doesn't have to loop through an entire collection for every unmatched item?

In other words,
Is this:

var x = listA.GroupJoin(
listB, a => a.prop,
b => b.prop,
(a, b) => new { a, b })
.Where(!x.b.Any()).Select(x => x.a);

more efficient than this:

var x = listA.Where(a => listB.All(b => b.prop != a.prop));


I guess the question is about LINQ to Objects, i.e. Enumerable.GroupJoin. So yes, the LINQ implementation of the GroupJoin (as well as Join) is using one of the most efficient general purpose lookup data structures - hash table. It can be seen in the reference source and also is mentioned in the documentation (although not directly) inside the Remarks section:

If comparer is null, the default equality comparer, Default, is used to hash and compare keys.

Since hash lookup has O(1) time complexity, the complexity of the join operation is O(N) while in the second case it is O(N * M), so the join is definitely much more efficient.