stack stack - 1 year ago 41
SQL Question

Should I try to reduce selected rows in the subqueries?

I have this query:

SELECT count(1)
FROM qanda question
JOIN qanda answer ON question.Id = answer.related
WHERE answer.related IS NOT NULL
AND answer.author_id = 29
AND question.amount IS NULL
AND date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
AND answer.id not in (
select post_id
from votes
group by post_id
having sum(value) < 0)


Currently that subquery selects 890 rows in reality. I can reduce it to just 43 rows by adding this (the time frame constraint which is in the outer query) also to the subquery:

. . . where date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))


And I can reduce it more (to just 9 rows) by adding this line on the
WHERE
clause of that subquery:

. . . and table_code = 15


Ok, Some programmers believe:


Adding a where clause to the inner query would reduce the rows but would only remove rows that don't matter.


But I myself believe adding a
WHERE
clause for that inner query makes it more faster. Well I'm confuse, should I add those conditions on the
WHERE
clause of that inner query or it doesn't matter?

Answer Source

What about this one

SELECT count(1)
FROM qanda question
JOIN qanda answer ON question.Id = answer.related
WHERE answer.related IS NOT NULL
AND answer.author_id = 29
AND question.amount IS NULL
AND answer.date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
AND 0 <= (
  select sum(value)
  from votes
  where post_id = answer.id
)

?

There are of course other ways.

However - your subquery is reading the entire votes table, while you only need those related to the authors answers. So you could limit the result set of your subquery using a join with the answers:

AND answer.id not in (
    select v.post_id
    from votes v
    join qanda a on a.id = v.post_id
    where a.author_id = 29
      and a.date_time > unix_timestamp(DATE_SUB(now(), INTERVAL 1 YEAR))
      and a.related IS NOT NULL
    group by v.post_id
    having sum(v.value) < 0
)