Sagar Sagar - 5 months ago 6
SQL Question

Why sorting of records in table is not as per clustered index?

I have been preparing for interview and just come to these thing.

I have executed following statements:

create table trial
Id int not null,
Name varchar(10)

alter table trial add constraint unq unique clustered (Name)

alter table trial add constraint pk primary key nonclustered(Id)

insert into trial values (1,'a'),(3,'d'),(5,'b'),(2,'c')

select * from trial

Result is as shown here:

enter image description here

My question is: Why result is not sorted as per name column as name column has clustered index?

Result is:

1 a
2 c
3 d
5 b

How to use index for physical sorting of table?


In this case I am sure optimizer decided to do a full table scan or nonclustered index scan, since it is very small. You can include actual execution plan and see this:

enter image description here

You can force to use clustered index:


And you will probably get:

enter image description here

and result set:

Id  Name
1   a
5   b
2   c
3   d

But you should not really do this, because ordering is still not guarantied. If you want your results to be sorted by some columns, do it explicitly!

I will copy a fragment from the book Exam 70-461: Querying Microsoft SQL Server 2012 where you can get some good explanation:

It might seem like the output is sorted by empid, but that’s not guaranteed. What could be more confusing is that if you run the query repeatedly, it seems like the result keeps being returned in the same order; but again, that’s not guaranteed. When the database engine (SQL Server in this case) processes this query, it knows that it can return the data in any order because there is no explicit instruction to return the data in a specific order. It could be that, due to optimization and other reasons, the SQL Server database engine chose to process the data in a particular way this time. There’s even some likelihood that such choices will be repeated if the physical circumstances remain the same. But there’s a big difference between what’s likely to happen due to optimization and other reasons and what’s actually guaranteed.

The database engine may—and sometimes does—change choices that can affect the order in which rows are returned, knowing that it is free to do so. Examples for such changes in choices include changes in data distribution, availability of physical structures such as indexes, and availability of resources like CPUs and memory. Also, with changes in the engine after an upgrade to a newer version of the product, or even after application of a service pack, optimization aspects may change. In turn, such changes could affect, among other things, the order of the rows in the result.

In short, this cannot be stressed enough: A query that doesn’t have an explicit instruction to return the rows in a particular order doesn’t guarantee the order of rows in the result. When you do need such a guarantee, the only way to provide it is by adding an ORDER BY clause to the query, and that’s the focus of the next section.

EDIT based on comments:

The thing is that even if you use clustered index it may return unordered set. Suppose you have physical order of clustered keys like (1, 2, 3, 4, 5). Most of the time you will get (1, 2, 3, 4, 5) but there can be situations when optimizer decides to do parallel reads and say it has 2 parallel reads and it reads (1, 2, 3) and (4, 5). Now it may happen that (4, 5) will be returned first and then (1, 2, 3) can be returned. If you have no order by clause engine will not spend its resources ordering that set and will give you (4, 5, 1, 2, 3). So this explains why you should always ensure you have order by clause when you want ordering.