onedaywhen onedaywhen - 4 months ago 17
SQL Question

Relational division in LINQ?

Relational division is one of Codd's primitive relational operators, colloquially known as the suppliers who supply all parts. There have been various translations into SQL e.g. Celko discusses several approaches using the example of the pilots who can fly all the planes in the hangar.

The one I prefer is "division with set operators" because it is "with remainder" (i.e. Wilson can also fly a F-17 Fighter but there isn't one in the hangar) and how it handles the case when the divisor is the empty set (i.e. when the hangar is empty then all pilots are returned):

WITH PilotSkills
AS
(
SELECT *
FROM (
VALUES ( 'Celko', 'Piper Cub' ),
( 'Higgins', 'B-52 Bomber' ), ( 'Higgins', 'F-14 Fighter' ),
( 'Higgins', 'Piper Cub' ),
( 'Jones', 'B-52 Bomber' ), ( 'Jones', 'F-14 Fighter' ),
( 'Smith', 'B-1 Bomber' ), ( 'Smith', 'B-52 Bomber' ),
( 'Smith', 'F-14 Fighter' ),
( 'Wilson', 'B-1 Bomber' ), ( 'Wilson', 'B-52 Bomber' ),
( 'Wilson', 'F-14 Fighter' ), ( 'Wilson', 'F-17 Fighter' )
) AS T ( pilot_name, plane_name )
),
Hangar
AS
(
SELECT *
FROM (
VALUES ( 'B-1 Bomber' ),
( 'B-52 Bomber' ),
( 'F-14 Fighter' )
) AS T ( plane_name )
)
SELECT DISTINCT pilot_name
FROM PilotSkills AS P1
WHERE NOT EXISTS (
SELECT plane_name
FROM Hangar
EXCEPT
SELECT plane_name
FROM PilotSkills AS P2
WHERE P1.pilot_name = P2.pilot_name
);


Now I need to do this in LINQ to Objects. Here's a suggested direct translation:

var hangar = new []
{
new { PlaneName = "B-1 Bomber" },
new { PlaneName = "F-14 Fighter" },
new { PlaneName = "B-52 Bomber" }
}.AsEnumerable();

var pilotSkills = new []
{
new { PilotName = "Celko", PlaneName = "Piper Cub" },
new { PilotName = "Higgins", PlaneName = "B-52 Bomber" },
new { PilotName = "Higgins", PlaneName = "F-14 Fighter" },
new { PilotName = "Higgins", PlaneName = "Piper Cub" },
new { PilotName = "Jones", PlaneName = "B-52 Bomber" },
new { PilotName = "Jones", PlaneName = "F-14 Fighter" },
new { PilotName = "Smith", PlaneName = "B-1 Bomber" },
new { PilotName = "Smith", PlaneName = "B-52 Bomber" },
new { PilotName = "Smith", PlaneName = "F-14 Fighter" },
new { PilotName = "Wilson", PlaneName = "B-1 Bomber" },
new { PilotName = "Wilson", PlaneName = "B-52 Bomber" },
new { PilotName = "Wilson", PlaneName = "F-14 Fighter" },
new { PilotName = "Wilson", PlaneName = "F-17 Fighter" }
}.AsEnumerable();

var actual = pilotSkills.Where
(
p1 => hangar.Except
(
pilotSkills.Where( p2 => p2.PilotName == p1.PilotName )
.Select( p2 => new { p2.PlaneName } )
).Any() == false
).Select( p1 => new { p1.PilotName } ).Distinct();

var expected = new []
{
new { PilotName = "Smith" },
new { PilotName = "Wilson" }
};

Assert.That( actual, Is.EquivalentTo( expected ) );


As LINQ is supposedly based on the relational algebra then a direct translation seems reasonable. But is there a better 'native' LINQ approach?




Reflecting on @Daniel Hilgarth's answer, in .NET Land the data is likely to be 'grouped' to begin with:

var pilotSkills = new []
{
new { PilotName = "Celko",
Planes = new []
{ new { PlaneName = "Piper Cub" }, } },
new { PilotName = "Higgins",
Planes = new []
{ new { PlaneName = "B-52 Bomber" },
new { PlaneName = "F-14 Fighter" },
new { PlaneName = "Piper Cub" }, } },
new { PilotName = "Jones",
Planes = new []
{ new { PlaneName = "B-52 Bomber" },
new { PlaneName = "F-14 Fighter" }, } },
new { PilotName = "Smith",
Planes = new []
{ new { PlaneName = "B-1 Bomber" },
new { PlaneName = "B-52 Bomber" },
new { PlaneName = "F-14 Fighter" }, } },
new { PilotName = "Wilson",
Planes = new []
{ new { PlaneName = "B-1 Bomber" },
new { PlaneName = "B-52 Bomber" },
new { PlaneName = "F-14 Fighter" },
new { PlaneName = "F-17 Fighter" }, } },
};


...and projecting just the name is arbitrary, making the potential solutions much more straightforward:

// Easy to understand at a glance:
var actual1 = pilotSkills.Where( x => hangar.All( y => x.Planes.Contains(y) ));

// Potentially more efficient:
var actual = pilotSkills.Where( x => !hangar.Except( x.Planes ).Any() );

Answer

The following should yield the same result:

pilotSkills.GroupBy(x => x.PilotName, x => x.PlaneName)
           .Where(g => hangar.All(y => g.Contains(y.PlaneName)))

This will return one group per pilot that can fly all planes in the hangar.
The key of the group is the pilot's name, the content of the group are all the planes the pilot can fly, even those that aren't in the hangar.

If you now just want the pilots, you can add a .Select(g => new { PilotName = g.Key }) to the end of the query.


Using the above approach with Except, making it closer to the OP's original:

pilotSkills.GroupBy(x => x.PilotName, x => new { x.PlaneName } )
           .Where(g => !hangar.Except(g).Any());

This second query is potentially better, because it iterates g only once; the first query with Contains iterates it N times with N being the number of planes in the hangar.