Jacob Jacob - 6 months ago 19
MySQL Question

How come this SQL query works?

This is a exercise from wikibooks, #6

The table schema is enter image description here

The question is:


For each piece, find the most expensive offering of that piece and include the piece name, provider name, and price (note that there could be two providers who supply the same piece at the most expensive price).


The solution:

SELECT Pieces.Name, Providers.Name, Price
FROM Pieces INNER JOIN Provides ON Pieces.Code = Piece
INNER JOIN Providers ON Providers.Code = Provider
WHERE Price =
(
SELECT MAX(Price) FROM Provides
WHERE Piece = Pieces.Code
);


I don't understand the subquery part. I think the typical way to find the max price is

SELECT MAX(Price) FROM Provides
group by piece;


In this way, I can not use price = SUBQUERY or price in SUBQUERY, and that sub query in solutions, looks like

SELECT MAX(Price) FROM Provides, pieces where provides.piece=pieces.code;


It just return the biggest number, I can not figure out why it can 'group' and return the right rows.

Answer

Although Thomas does provide a much simpler solution, lets get back to your original question, WHY does it work.

SELECT Pieces.Name, Providers.Name, Price
   FROM Pieces INNER JOIN Provides ON Pieces.Code = Piece
               INNER JOIN Providers ON Providers.Code = Provider
   WHERE Price =
   (
     SELECT MAX(Price) FROM Provides
     WHERE Piece = Pieces.Code
   );

First, I hate working with correlated subqueries as in this answer. A correlated subquery is one that the subquery is processed once for EACH record. Notice the outer part of the query provides the "Pieces" table reference. So the inner query is saying from the "PROVIDES" table, give me the maximum price for the current "Pieces.code" value. After that, it is a simple join to the other tables to grab the piece and provider details.

My personal preference is to do a pre-aggregate subquery ONCE on the "Provides" table by all codes with its own group by. This runs the query once, grouped so there is only one record per respective code. More often you will see queries like this to prevent the larger overhead. Also, it is good to work with alias names, especially if you are dealing ex using an alias.

from LongTableNamesInYourDatabase LTN

And whenever you work with multiple tables, always try to provide the table.column or alias.column so others trying to help you in the future know where a specific column is coming from and not just guessing.

SELECT 
      P.Name, 
      Prov.Name, 
      MaxByPiece.MaxPrice
   FROM
      ( SELECT 
              Pr1.Piece,
              MAX( Pr1.Price ) as MaxPrice
           FROM 
              Provides Pr1
           group by
              Pr1.Piece ) as MaxByPiece
         JOIN Provides Pr2 on MaxByPiece.Piece = Pr2.Piece AND MaxByPiece.MaxPrice = Pr2.Price
           JOIN Pieces P on Pr2.Piece = P.Code
           JOIN Providers Prov on Pr2.Provider = Prov.Code

It may look more complex, but is more applicable if you have multiple tables with multiple rows for a given thing (contract, order, person, sales rep, whatever) and would otherwise end up with a Cartesian result and wonder why the duplicates in the counts or totals.

The first from clause query does nothing but get the maximum price per any piece and I use the alias Pr1 to differentiate that from the next join. The join after that is the provides so we can find ALL Pieces at that price. Remember the question wanted ALL Providers at that maximum price. So Now, I have the records that qualified at the max price per piece and provider. So I finish by joining to those lookup tables so I can get the names

Comments