pikkewyn pikkewyn - 5 months ago 24
SQL Question

SQL Execution Plan

I have two similar tables,

CREATE TABLE [dbo].[StockPrices] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[CompanyId] INT NOT NULL,
[Date] DATETIME NOT NULL,
[Open] DECIMAL (18, 2) NOT NULL,
[Close] DECIMAL (18, 2) NOT NULL,
[Low] DECIMAL (18, 2) NOT NULL,
[High] DECIMAL (18, 2) NOT NULL,
[Volume] INT NOT NULL,
CONSTRAINT [PK_dbo.StockPrices] PRIMARY KEY NONCLUSTERED ([Id] ASC),
CONSTRAINT [FK_dbo.StockPrices_dbo.Companies_CompanyId] FOREIGN KEY ([CompanyId]) REFERENCES [dbo].[Companies] ([Id]) ON DELETE CASCADE
);

GO
CREATE CLUSTERED INDEX [IX_CompanyId] ON [dbo].[StockPrices]([CompanyId] ASC);

GO
CREATE NONCLUSTERED INDEX [IX_Date] ON [dbo].[StockPrices]([Date] ASC);


and

CREATE TABLE [dbo].[News] (
[Id] INT IDENTITY (1, 1) NOT NULL,
[NewsProviderId] INT NOT NULL,
[CompanyId] INT NOT NULL,
[Date] DATETIME NOT NULL,
[Title] NVARCHAR (128) NOT NULL,
[Description] NVARCHAR (256) NOT NULL,
[Url] NVARCHAR (256) NOT NULL,
CONSTRAINT [PK_dbo.News] PRIMARY KEY NONCLUSTERED ([Id] ASC),
CONSTRAINT [FK_dbo.News_dbo.Companies_CompanyId] FOREIGN KEY ([CompanyId]) REFERENCES [dbo].[Companies] ([Id]) ON DELETE CASCADE
);
GO

CREATE CLUSTERED INDEX [IX_CompanyId] ON [dbo].[News]([CompanyId] ASC);
GO

CREATE NONCLUSTERED INDEX [IX_Date] ON [dbo].[News]([Date] ASC);
GO


and two similar queries

select *
from news
where companyid = 1
and date >= '01/01/2010'
and date <= '01/31/2010'
order by date;

select *
from stockprices
where companyid = 1
and date >= '01/01/2010'
and date <= '01/31/2010'
order by date;


and I am getting two completely different actual execution plans

`Query1: Relative to the batch: 86%
SELECT (COST 0%) <- Nested Loops (Inneer Join)(Cost 0%) <- Index Seek (NonClustered) [News].[IX_Date](Cost 1%)
<- Key Lookup (Clustered) [News].[IX_CompanyId](Cost 99%)

Query 2: Relative to the batch: 14%
SELECT (Cost0%) <- Sort (Cost 33%) <- Clustered Index Scan (Clustered) [StockPrices]IX_CompanyId`

I am not sure why? Can you advice something?

Thank you

Answer

The first one is using a seek on a non covering clustered index in date order and a key lookup to get the remaining columns for rows matching companyid = 1.

The second one is doing a scan on a covering index and then sorting the filtered result.

This is a cost based decision depending on what proportion of the table is estimated to match and the width of the two indexes (some example calculations here).

The tipping point can be a very low proportion of the table as the cost based optimiser costs an index scan as cheaper per page read than the random IO of lookups. The fact that the non covering index can avoid a sort in this case may make the tipping point a bit higher than otherwise.

Look at the estimated rows from each. Also consider that the News table contains various string columns and likely fits less rows on a clustered index page than the numeric values in the StockPrices table - so a full clustered index scan on News may well be relatively more expensive and cause a higher tipping point.