Triforce711 Triforce711 - 10 months ago 73
SQL Question

LINQ to Entities performance issue with Where and Contains

Table A has - ID(PK), PartNumber, Code1, Code2

Table B has - InventoryID(PK) PartNumber, Part, and a bunch of other columns.

I need to get everything from Table B where Table B's PartNumber is NOT in Table A.

Example: Table B has PartNumber 123. There is no PartNumber in Table A for 123. Get that row.

What I currently have:

using (SomeEntity context = new SomeEntity())
var partmasterids = context.PartsMasters.Select(x => x.PartNumber).Distinct().ToList();
var test = context.Parts.Where(x => !partmasterids.Contains(x.PartNumber)).ToList();

I currently first get and select all the distinct part numbers from Table A.

Then I check Table A and Table B's partnumbers and get each part from Table B where there that part number is not in Table A.

There are about 11,000 records in table B and 200,000 records in table A.
I should be getting about 9000 parts which are not in table A.

I am running into huge performance issues with that second LINQ statement. If I do a .Take(100), that will even take around 20-30 seconds. Anything above 1000 will take way too long.

Is there a better way to write this LINQ statement?

Answer Source

From what I understand, the equivalent in SQL would be something like

SELECT B.PartNumber AS MissingParts
FROM TableB as B 
LEFT OUTER JOIN TableA as A ON B.PartNumber = A.PartNumber

Run that SQL and measure the time it takes. Without indexes, that's as fast as it's going to get.

Now, if you really have to do it in EF, you'll need to do an equivalent statement, complete with the left join. Based on this question, it would probably look something like this (untested!)

var query = from b in TableB
        join a in TableA on b.PartNumber equals a.PartNumber into joind
        from existsInA in joind.DefaultIfEmpty()
        where existsInA == null
        select b.PartNumber;

var missingParts = query.Distinct().ToList();