Seibar Seibar - 1 month ago 8
SQL Question

What's the difference between a Table Scan and a Clustered Index Scan?

Since both a

Table Scan
and a
Clustered Index Scan
essentially scan all records in the table, why is a Clustered Index Scan supposedly better?

As an example - what's the performance difference between the following when there are many records?:

declare @temp table(
SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

-----------------------------

declare @temp table(
RowID int not null identity(1,1) primary key,
SomeColumn varchar(50)
)

insert into @temp
select 'SomeVal'

select * from @temp

Answer

In a table without a clustered index (a heap table), data pages are not linked together - so traversing pages requires a lookup into the Index Allocation Map.

A clustered table, however, has it's data pages linked in a doubly linked list - making sequential scans a bit faster. Of course, in exchange, you have the overhead of dealing with keeping the data pages in order on INSERT, UPDATE, and DELETE. A heap table, however, requires a second write to the IAM.

If your query has a RANGE operator (e.g.: SELECT * FROM TABLE WHERE Id BETWEEN 1 AND 100), then a clustered table (being in a guaranteed order) would be more efficient - as it could use the index pages to find the relevant data page(s). A heap would have to scan all rows, since it cannot rely on ordering.

And, of course, a clustered index lets you do a CLUSTERED INDEX SEEK, which is pretty much optimal for performance...a heap with no indexes would always result in a table scan.

So:

  • For your example query where you select all rows, the only difference is the doubly linked list a clustered index maintains. This should make your clustered table just a tiny bit faster than a heap with a large number of rows.

  • For a query with a WHERE clause that can be (at least partially) satisfied by the clustered index, you'll come out ahead because of the ordering - so you won't have to scan the entire table.

  • For a query that is not satisified by the clustered index, you're pretty much even...again, the only difference being that doubly linked list for sequential scanning. In either case, you're suboptimal.

  • For INSERT, UPDATE, and DELETE a heap may or may not win. The heap doesn't have to maintain order, but does require a second write to the IAM. I think the relative performance difference would be negligible, but also pretty data dependent.

Microsoft has a whitepaper which compares a clustered index to an equivalent non-clustered index on a heap (not exactly the same as I discussed above, but close). Their conclusion is basically to put a clustered index on all tables. I'll do my best to summarize their results (again, note that they're really comparing a non-clustered index to a clustered index here - but I think it's relatively comparable):

  • INSERT performance: clustered index wins by about 3% due to the second write needed for a heap.
  • UPDATE performance: clustered index wins by about 8% due to the second lookup needed for a heap.
  • DELETE performance: clustered index wins by about 18% due to the second lookup needed and the second delete needed from the IAM for a heap.
  • single SELECT performance: clustered index wins by about 16% due to the second lookup needed for a heap.
  • range SELECT performance: clustered index wins by about 29% due to the random ordering for a heap.
  • concurrent INSERT: heap table wins by 30% under load due to page splits for the clustered index.