Ken Smith Ken Smith - 1 month ago 11
C# Question

Entity Framework query caching

This MSDN article lists a whole bunch of ways to improve your Entity Framework performance:

https://msdn.microsoft.com/en-us/data/hh949853.aspx

One of its suggestions (4.3) is to convert the properties of a non-mapped object into a local variable, so that EF can cache its internal query plan.

Sounds like a great idea. So I put it to the test with a simple query that compared the performance over 10,000 iterations of an indirect property reference in a query to a local variable. Like so:

[Fact]
public void TestQueryCaching()
{
const int iterations = 1000;

var quote = new Quote();
using (var ctx = new CoreContext())
{
quote.QuoteId = ctx.Quotes.First().Id;
}

double indirect = 0;
double direct = 0;

10.Times(it =>
{
indirect += PerformCoreDbTest(iterations, "IndirectValue", (ctx, i) =>
{
var dbQuote = ctx.Quotes.First(x => x.Id == quote.QuoteId);
}).TotalSeconds;
direct += PerformCoreDbTest(iterations, "DirectValue", (ctx, i) =>
{
var quoteId = quote.QuoteId;
var dbQuote = ctx.Quotes.First(x => x.Id == quoteId);
}).TotalSeconds;
});

_logger.Debug($"Indirect seconds: {indirect:0.00}, direct seconds:{direct:0.00}");
}


But I'm not seeing any real performance benefit. On two different machines, these are the results over 5 iterations:

Machine1 - Indirect seconds: 9.06, direct seconds:9.36
Machine1 - Indirect seconds: 9.98, direct seconds:9.84
Machine2 - Indirect seconds: 22.41, direct seconds:20.38
Machine2 - Indirect seconds: 17.27, direct seconds:16.93
Machine2 - Indirect seconds: 16.35, direct seconds:16.32


Using a local variable - the "direct" approach that the MSDN article recommends - is maybe the tiniest bit faster (4/5 times), but not consistently, and not really by much.

Am I doing something wrong in my testing? Or is the effect really so slight that it doesn't make much difference? Or is the MSDN article basically wrong, and that way of referring to objects doesn't make any difference to query caching?

** Edits 10/9/16 **
I modified the query to (a) make it more complex, and (b) to pass in a different quoteId each time. I suspect the latter is important, as otherwise the query does in fact get cached - since there aren't any parameters. See the answer from @raderick below.

Here's the more complex test:

[Fact]
public void TestQueryCaching()
{
const int iterations = 1000;

List<EFQuote> quotes;
using (var ctx = new CoreContext())
{
quotes = ctx.Quotes.Take(iterations).ToList();
}

double indirect = 0;
double direct = 0;
double iqueryable = 0;

10.Times(it =>
{
indirect += PerformCoreDbTest(iterations, "IndirectValue", (ctx, i) =>
{
var quote = quotes[i];
var dbQuote = ctx.Quotes
.Include(x => x.QuoteGroup.QuoteGroupElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuoteElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuotePackage)
.Include(x => x.QuoteDefinition)
.Include(x => x.QuoteLines)
.First(x => x.Id == quote.Id);
}).TotalSeconds;
direct += PerformCoreDbTest(iterations, "DirectValue", (ctx, i) =>
{
var quoteId = quotes[i].Id;
var dbQuote = ctx.Quotes
.Include(x => x.QuoteGroup.QuoteGroupElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuoteElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuotePackage)
.Include(x => x.QuoteDefinition)
.Include(x => x.QuoteLines)
.First(x => x.Id == quoteId);
}).TotalSeconds;
iqueryable += PerformCoreDbTest(iterations, "IQueryable", (ctx, i) =>
{
var quoteId = quotes[i].Id;
var dbQuote = ctx.Quotes
.Include(x => x.QuoteGroup.QuoteGroupElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuoteElements.Select(e => e.DefaultElement.DefaultChoices))
.Include(x => x.QuotePackage)
.Include(x => x.QuoteDefinition)
.Include(x => x.QuoteLines)
.Where(x => x.Id == quoteId).First();
}).TotalSeconds;
});

_logger.Debug($"Indirect seconds: {indirect:0.00}, direct seconds:{direct:0.00}, iqueryable seconds:{iqueryable:0.00}");
}


And the results are much more like what the MSDN article above describes:

Indirect seconds: 141.32, direct seconds:91.95, iqueryable seconds:93.96

Answer

I am not 100% sure that this article can describe current behavior as for Entity Framework version 6, but this thing should be related to query compilation in Entity Framework into stored procedures.

When you first call some query using Entity Framework, it has to be compiled by EF into an SQL statement - either a pure SELECT query, or a procedure using exec and parameters for it, example:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[IssuedAt] AS [IssuedAt], 
    [Extent1].[Status] AS [Status], 
    [Extent1].[Foo_Id] AS [Foo_Id]
    FROM [dbo].[Activities] AS [Extent1]
    WHERE (N''Some Name'' = [Extent1].[Name]) AND ([Extent1].[Id] = @p__linq__0)',N'@p__linq__0 int',@p__linq__0=0

@p__linq__0 is a parameter in the query, so every time you change Id in the query code, Entity Framework will pick this exact same statement from query cache and call it without trying to compile SQL for it again. On the other hand N''Some Name'' = [Extent1].[Name] part is equal to code x.Name == "Some Name", I used a constant here so it was transformed not to query parameter, but to simple part of the query statement.

Each time you try to make a query, Entity Framework checks cache containing complied SQL statements to see if there is an already compiled statement it can re-use with parameters. If that statement is not found, Entity Framework has to compile C# query to Sql again. So if you have your queries small and fast-compiled, you won't notice anything, but if you have "hard-to-compile" queries with a lot of includes, conditions, transformations and built-in function usage, you can hit heavy penalties when your queries don't hit Entity Framework compiled queries cache.

You can see some similarity here with current work of paging without using overloads for Skip and Take, not hitting compiled query cache when changing page: Force Entity Framework to use SQL parameterization for better SQL proc cache reuse

You can face this effect when using constants in your code, and its effect is quite non-obvious. Let's compare these code pieces and SQL that EntityFramework produces (I omitted class definitions for brevity, should be pretty obvious):

Query 1

Sample Code:

var result = context.Activities
                    .Where(x => x.IssuedAt >= DateTime.UtcNow && x.Id == iteration)    
                    .ToList(); 

Produced Sql:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[IssuedAt] AS [IssuedAt], 
    [Extent1].[Status] AS [Status], 
    [Extent1].[Foo_Id] AS [Foo_Id]
    FROM [dbo].[Activities] AS [Extent1]
    WHERE ([Extent1].[IssuedAt] >= (SysUtcDateTime())) AND ([Extent1].[Id] = @p__linq__0)',N'@p__linq__0 int',@p__linq__0=0

You can see that in this case condition x.IssuedAt >= DateTime.UtcNow is transformed to statement [Extent1].[IssuedAt] >= (SysUtcDateTime()).

Query 2

Sample Code:

var now = DateTime.UtcNow;

var result = context.Activities
                    .Where(x => x.IssuedAt >= now && x.Id == iteration)
                    .ToList();

Produced Sql:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[IssuedAt] AS [IssuedAt], 
    [Extent1].[Status] AS [Status], 
    [Extent1].[Foo_Id] AS [Foo_Id]
    FROM [dbo].[Activities] AS [Extent1]
    WHERE ([Extent1].[IssuedAt] >= @p__linq__0) AND ([Extent1].[Id] = @p__linq__1)',N'@p__linq__0 datetime2(7),@p__linq__1 int',@p__linq__0='2016-10-09 15:27:37.3798971',@p__linq__1=0

In this case, you can see that condition x.IssuedAt >= now was transformed to [Extent1].[IssuedAt] >= @p__linq__0 - a parametrized statement, and DateTime value was passed as procedure argument.

You can clearly can see the difference here with Query 1 - condition was part of the query code without parameter, and it used built-in function for getting date time.

These 2 queries might give you a hint, that usage of constants in Entity Framework produces different queries from using only fields, properties, arguments, etc. It was a bit of synthetic example, let's check something more close to real query.

Query 3

Here I use enum ActivityStatus and want to query for Activity, that has specific Id, and I want to be able to get only activities, that have status "Active" (whatever that means).

Sample Code:

var result = context.Activities
    .Where(x => x.Status == ActivityStatus.Active 
                && x.Id == id)
    .ToList();

Produced Sql:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[IssuedAt] AS [IssuedAt], 
    [Extent1].[Status] AS [Status], 
    [Extent1].[Foo_Id] AS [Foo_Id]
    FROM [dbo].[Activities] AS [Extent1]
    WHERE (0 = [Extent1].[Status]) AND ([Extent1].[Id] = @p__linq__0)',N'@p__linq__0 int',@p__linq__0=0

You can see, that usage of constant in condition x.Status == ActivityStatus.Active produces SQL 0 = [Extent1].[Status], which is correct. Status here is not parametrized, so if you call same query somewhere else using condition x.Status = ActivityStatus.Pending, that will produce another query, so calling it for the first time will cause Entity Framework query compilation. You can avoid it using Query 4 for both.

Query 4

Sample Code:

var status = ActivityStatus.Active;

var result = context.Activities
                    .Where(x => x.Status == status
                                && x.Id == iteration)
                    .ToList();

Produced Sql:

exec sp_executesql N'SELECT 
    [Extent1].[Id] AS [Id], 
    [Extent1].[Name] AS [Name], 
    [Extent1].[IssuedAt] AS [IssuedAt], 
    [Extent1].[Status] AS [Status], 
    [Extent1].[Foo_Id] AS [Foo_Id]
    FROM [dbo].[Activities] AS [Extent1]
    WHERE ([Extent1].[Status] = @p__linq__0) AND ([Extent1].[Id] = @p__linq__1)',N'@p__linq__0 int,@p__linq__1 int',@p__linq__0=0,@p__linq__1=0

As you can see, this query statement is fully parametrized, so changing status to Pending, Active, Inactive, etc. will still use the same query from the compiled queries cache.

Depending on your coding style, you might face this issue from time to time, when same 2 queries that have only different constant value will compile a query each. I can offer you to try same query with using booleans as constants, it should produce same result - having condition unparametrized.