Daniel Valland Daniel Valland - 4 months ago 12
MySQL Question

Performance in mysql joins with subquery and limit

In a join operation between two subqueries (or a table and a subquery), is it preferred to specify the LIMIT clause in an inner query rather than on the outer query (since the order would determine the amount of rows the DBMS would have to iterate to check the where clause)? like:

((
SELECT id
FROM Table1
WHERE score>=100
LIMIT 10)
AS A NATURAL JOIN Table2
))


would be better than

((
SELECT id
FROM Table1
WHERE score>=100)
AS A NATURAL JOIN Table2
))
LIMIT 10


My thinking is that in the last query, the DBMS first have to iterate (full table or an index) ALL rows in Table1 where score>=100 that can be mapped to Table2 on their common columns (which could be any number of rows), and only after that it will truncate to only 10 rows, whereas in the first query, it will only scan until it has found 10 rows from Table1 that satisfy the where clause and can be mapped to Table2, then stop....

Answer

The 2 partial statements are not equivalent. When using LIMIT order matters. If you're placing the limit on Table1 you might never see the rows you would have otherwise seen with limit placed on the whole dataset. Given that disclaimer, it seems like using the limit and then joining would be more efficient, but rule of thumb is you should always measure.

Also consider that instead of joining the SELECT as table, for which MySQL will have to build an internal temporary table, you could join the table itself, i.e.:

SELECT t0.col0, t1.col1
FROM
    Table0 t0
    JOIN Table1 t1 ON (t0.col0 = t1.col0 AND t1.score >= 100)

which might be even more efficient if you have good indexes and end up using them. But again, you should measure.