Riya Riya - 6 months ago 27
SQL Question

Processing of SQL

I have a question regarding the processing of SQL code. Which of the following program is more efficient? Or is it same?

My understanding - the second program is more efficient as it subsets data before join

select a.custid, b.ref_date,b.status
from customertable a
left join accounts b
on a.custid = b.custid
where '2015-01-01' <= b.ref_date <= '2016-01-01'

select a.custid, b.ref_date,b.status
from customertable a
left join (select * from accounts
where '2015-01-01' <= ref_date <= '2016-01-01') b
on a.custid = b.custid

Answer

SQL is a declarative language, which means that you tell the systems what results you want and it decides how to fetch those results. So in most cases there's no way to say with certainty whether one query over another will be more "efficient", since the compiler has a lot of leeway in what plan it develops.

That said, your two queries are not equivalent.

suppose you had the following data:

customertable 
-------------

custid
------
1
2

accounts
--------

custid    refdate     
------    -------     
1         2015-01-01  
2         2014-01-01 

The results of the first query would not include customer 2, since the where clause would filter it out because there is a matching account, but the refdate is not in the required range. The second query would include customer 2, but the refdate and status would be null because the matching account did not satisfy the join clause.

The following query would be equivalent to your second one:

select a.custid, b.ref_date,b.status
from customertable a
left join accounts b
on a.custid = b.custid
  and '2015-01-01' <= b.ref_date <= '2016-01-01'

I would not expect a significant difference in efficiency between the two queries, since the plans the compiler comes up with are likely to be identical.

In general, the only way to say with certainty if one query is more efficient that the other is to try it both ways and measure the results. Even then, it may not always be more efficient, since changes over time such as data dispersion, efficiency of indexes, and other factors can make one query more efficient in a given context.