Eva Eva - 1 year ago 104
SQL Question

SQL correlated subqueries

I have been working on this exercise on Codecademy for several days and still couldn't understand the logic behind it.Below is the exercise and code:

It would be interesting to order flights by giving them a sequence number based on time, by carrier.
For instance, assuming flight_id increments with each additional flight, we could use the following query to view flights by carrier, flight id, and sequence number:

SELECT carrier, id,
FROM flights f
WHERE f.id < flights.id
AND f.carrier=flights.carrier) + 1 AS flight_sequence_number
FROM flights;

I can understand that
is a virtual form of table
, but what does
f.id < flights.id
do? Does it mean that SQL compares each row in
with each row in

compare MQ 17107 with MQ 7869,

compare MQ 17107 with MQ 2205,

compare MQ 17107 with MQ 14979


compare MQ 7869 with MQ 2205,

compare MQ 7869 with MQ 14979


Beside, what does this
really count? And why the

This is the result image: query result

Any help will be appreciated. Thanks.

Answer Source

The query selects records from table flights. For each record it selects carrier, id and the flight_sequence_number.

So for every record in flights the subquery gets executed. It reads the table flights again, but only takes records with the same carrier and a flight ID smaller than the one of the main record. In order to talk about the main query record and the sub query record, you need one or two table aliases, for otherwise both records would be called flight as in flight.id, and it would not be clear which one you are talking of. So the table in the inner query gets the alias f. Now you can compare f.id < flights.id and f.carrier = flights.carrier.

COUNT(*) counts hence all records lower than a main record's flight ID for its carrier and thus numbers the carrier's rows. For the smallest ID you find no smaller IDs so the count is 0 and you add one thus getting row number 1. For the second smallest you find one record with a smaller ID, hence you get count 1, add 1 and get row number 2 and so on.

The results will look better, when you add an order by clause ORDER BY carrier, id, so you show the results in the order used.

As has been mentioned, several modern DBMS offer analytic functions, such as ROW_NUMBER and the query becomes much simpler:

  row_number() over (partition by carrier order by id) as flight_sequence_number
from flights
order by carrier, id;