stack stack - 2 months ago 7
SQL Question

Should I try to reduce selected rows in the subqueries?

I have this query:

SELECT count(1)
FROM qanda question
JOIN qanda answer ON question.Id = answer.related
WHERE answer.related IS NOT NULL
AND answer.author_id = 29
AND question.amount IS NULL
AND date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
AND answer.id not in (
select post_id
from votes
group by post_id
having sum(value) < 0)


Currently that subquery selects 890 rows in reality. I can reduce it to just 43 rows by adding this (the time frame constraint which is in the outer query) also to the subquery:

. . . where date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))


And I can reduce it more (to just 9 rows) by adding this line on the
WHERE
clause of that subquery:

. . . and table_code = 15


Ok, Some programmers believe:


Adding a where clause to the inner query would reduce the rows but would only remove rows that don't matter.


But I myself believe adding a
WHERE
clause for that inner query makes it more faster. Well I'm confuse, should I add those conditions on the
WHERE
clause of that inner query or it doesn't matter?

Answer

What about this one

SELECT count(1)
FROM qanda question
JOIN qanda answer ON question.Id = answer.related
WHERE answer.related IS NOT NULL
AND answer.author_id = 29
AND question.amount IS NULL
AND answer.date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
AND 0 <= (
  select sum(value)
  from votes
  where post_id = answer.id
)

?

There are of course other ways.

However - your subquery is reading the entire votes table, while you only need those related to the authors answers. So you could limit the result set of your subquery using a join with the answers:

AND answer.id not in (
    select v.post_id
    from votes v
    join qanda a on a.id = v.post_id
    where a.author_id = 29
      and a.date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
      and a.related IS NOT NULL
    group by v.post_id
    having sum(v.value) < 0
)