Haniball Haniball - 5 months ago 6
SQL Question

SQL - Correlated subqueries

I have problems in understanding this correlated subquery:

SELECT Country, CustomerID,
(SELECT COUNT(*)
FROM Customers AS CustomersInner
WHERE CustomersInner.CustomerID < CustomersOuter.CustomerID
AND CustomersInner.Country=CustomersOuter.Country) + 1
AS customer_seq_num
FROM Customers AS CustomersOuter ORDER BY Country;


This example is from http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all

How does this query work exactly? Why is there a comparison between CustomersInner.CustomerID < CustomersOuter.CustomerID ? And how can the +1 be explained? A deeper explanation would be great.

Answer

The correlated subquery is executed for every row in the main query. In this case, main query is

SELECT Country, CustomerID
  FROM Customers As CustomersOuter
 ORDER BY Country

This query returns all customers IDs order by country.

Country   | Customer ID
-----------------------
Argentina | 12
Argentina | 54
Argentina | 64
Austria   | 20
Austria   | 59

Correlated subquery is executed for each country / customerID pair. In this case:

For first row:

SELECT COUNT(*)
  FROM Customers AS CustomersInner
 WHERE CustomersInner.CustomerID < 12
   AND CustomersInner.Country='Argentina') + 1

As there's not a Customer with ID < 12 in Argentina count(*) returns zero with +1 you get 1

For second row:

SELECT COUNT(*)
  FROM Customers AS CustomersInner
 WHERE CustomersInner.CustomerID < 54
   AND CustomersInner.Country='Argentina') + 1

result will be 1 (customerID = 12) + 1 = 2

and so on...

When you change the Country, the count(*) begins again in zero, so you have a running counter: for each country counts the number of customers.

Country   | Customer ID | customer_seq_num
------------------------------------------
Argentina | 12          | 1
Argentina | 54          | 2
Argentina | 64          | 3
Austria   | 20          | 1    <--- change country, reset counter (no customerID < 20 in 'Austria')
Austria   | 59          | 2