Corporalis Corporalis - 1 month ago 21
SQL Question

Entity Framework Include performance

I have been looking at Entity Framework performance, particularly around the use of Includes and the time taken to both generate and execute the various queries.

I am going to detail changes I have made, but please correct me if you think any of these assumptions are wrong.

Firstly we have around 10,000 items (not many) in a db and the database is significantly normalised (which results in a significant number of navigation properties). Currently the approach is to lazy load everything and given that requesting one item can spool off tens of db requests, the performance is quite poor, particularly for larger sets of data.
(This is an inherited project and step one is trying to improve performance without significant restructuring)

So my first step was to take the results of a query and then apply the Includes for the navigation properties only to those results.
I know this technically performs 2 queries, but if we have 10,000 items stored, but only want to return 10 items, it makes more sense to only include the navigation properties on those 10 items.

Secondly, where multiple includes are used on a query result and that result set size is quite large, it still suffered from poor performance. I have been pragmatic about when to eager load and when to leave the lazy loading in place.
My next change was to load query includes in batches, so performing:

query.Include(q => q.MyInclude).Load();


This once again significantly improved performance, although a few more db calls (one for each batch of includes) it was quicker than a large query or at the very least reduced the overhead of of Entity Framework trying to produce that large query.

So the code now looks something like this:

var query = ctx.Filters.Where(x => x.SessionId == id)
.Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);
query
.Include(x => x.ItemNav1)
.Include(x => x.ItemNav2).Load();

query
.Include(x => x.ItemNav3)
.Include(x => x.ItemNav4).Load();

query
.Include(x => x.ItemNav5)
.Include(x => x.ItemNav6).Load();


Now this is reasonably performant, however, it would be nice to improve this further.

I had considered using
LoadAsync()
, which after a bit more refactoring would be possible and it would better fit with the rest of the architecture.

However, you can only execute one query at a time on a db context. So I was wondering if there was anyway to possible create a new db context, perform
LoadAsync()
on each group of navigation properties (asynchronously) and then concatenate all of the results.

I know technically how you might create a new context, fire off a
LoadAsync()
for each navigation group, but not how to concatenate the results, I don't know if it is definitely possible or whether it goes against good practice.

So my question is; is this possible or, is there another way I can further improve performance? I'm trying to stick with what Entity Framework provides rather than crafting some stored procs. Thanks

UPDATE

Regarding the performance disparity I'm seeing between using all Includes in one statement and Loading these in small groups. When running a query that returns 6000 items. (Using SQL profiler and VS diagnostics to determine times)

Grouped Includes: In total takes ~8 seconds to execute the includes.

Includes in one statement: SQL query is taking ~30 seconds to load. (Often getting timeouts)

After a bit more investigation, I don't think there is much overhead when EF converts the sql results to models. However we have seen nearly 500ms taken for EF to generate complex queries, which isn't ideal, but I'm not sure this can be resolved

UPDATE 2

With Ivan's help and following this https://msdn.microsoft.com/en-gb/data/hh949853.aspx we were able to improve things further, particularly using
SelectMany
. I would highly recommend the msdn article to anyone attempting to improve their EF performance.

Answer

Your second approach relies on the EF navigation property fixup process. The problem is though that every

query.Include(q => q.ItemNavN).Load();

statement will also include all the master record data along with the related entity data.

Using the same basic idea, one potential improvement could be to execute one Load per each navigation property, replacing the Include with either Select (for references) or SelectMany (for collections) - something similar to how EF Core processes the Includes internally.

Taking your second approach example, you could try the following and compare the performance:

var query = ctx.Filters.Where(x => x.SessionId == id)
    .Join(ctx.Items, i => i.ItemId, fs => fs.Id, (f, fs) => fs);

query.Select(x => x.ItemNav1).Load();
query.Select(x => x.ItemNav2).Load();
query.Select(x => x.ItemNav3).Load();
query.Select(x => x.ItemNav4).Load();
query.Select(x => x.ItemNav5).Load();
query.Select(x => x.ItemNav6).Load();

var result = query.ToList();
// here all the navigation properties should be populated 
Comments