I'm having trouble getting my head around evaluating correlated subqueries. An example is using a correlated subquery in SELECT so that GROUP BY isn't needed:
Consider the relations:
Movies : Title, Director Length
Schedule : Theatre, Title
SELECT S.Theater, MAX(M.Length)
FROM Movies M JOIN Schedule S ON M.Title=S.Title
GROUP BY S.Theater
SELECT DISTINCT S.theater,
FROM Movies M
FROM Schedule S
From a conceptual standpoint, imagine that the database is going through each row of the result without the subquery:
SELECT DISTINCT S.Theater, S.Title FROM Schedule S
And then, for each one of those, running the subquery for you:
SELECT MAX(M.Length) FROM Movies M WHERE M.Title = (whatever S.Title was)
And placing that in as the value. Really, it's not (conceptually) that different from using a function:
SELECT DISTINCT S.Theater, SUBSTRING(S.Title, 1, 5) FROM Schedule S
It's just that this function performs a query against another table, instead.
I do say conceptually, though. The database may be optimizing the correlated query into something more like a join. Whatever it does internally matters for performance, but doesn't matter as much for understanding the concept.
But, it may not return the results you're expecting. Consider the following data (sorry sqlfiddle seems to be erroring atm):
CREATE TABLE Movies ( Title varchar(255), Length int(10) unsigned, PRIMARY KEY (Title) ); CREATE TABLE Schedule ( Title varchar(255), Theater varchar(255), PRIMARY KEY (Theater, Title) ); INSERT INTO Movies VALUES ('Star Wars', 121); INSERT INTO Movies VALUES ('Minions', 91); INSERT INTO Movies VALUES ('Up', 96); INSERT INTO Schedule VALUES ('Star Wars', 'Cinema 8'); INSERT INTO Schedule VALUES ('Minions', 'Cinema 8'); INSERT INTO Schedule VALUES ('Up', 'Cinema 8'); INSERT INTO Schedule VALUES ('Star Wars', 'Cinema 6');
And then this query:
SELECT DISTINCT S.Theater, ( SELECT MAX(M.Length) FROM Movies M WHERE M.Title = S.Title ) AS MaxLength FROM Schedule S;
You'll get this result:
+----------+-----------+ | Theater | MaxLength | +----------+-----------+ | Cinema 6 | 121 | | Cinema 8 | 91 | | Cinema 8 | 121 | | Cinema 8 | 96 | +----------+-----------+
As you can see, it's not a replacement for GROUP BY (and you can still use GROUP BY), it's just running the subquery for each row. DISTINCT will only remove duplicates from the result. It's not giving the "greatest length" per theater anymore, it's just giving each unique movie length associated with the theater name.
PS: You might likely use an ID column of some sort to identify movies, rather than using the Title in the join. This way, if by chance the name of the movie has to be amended, it only needs to change in one place, not all over Schedule too. Plus, it's faster to join on an ID number than a string.