91DarioDev 91DarioDev - 3 years ago 190
SQL Question

Should one avoid repeating aggregate functions by using subqueries and aliases?

this query shorter:

SELECT main.num_msgs, main.num_grps, main.rnk
FROM (
SELECT
user_id,
COUNT(distinct group_id) AS num_grps,
COUNT(*) AS num_msgs,
RANK() OVER(ORDER BY COUNT(*)DESC, COUNT(distinct group_id) DESC, user_id DESC) rnk
FROM messages
WHERE message_date > date_trunc('week', now())
GROUP BY user_id
) AS main
WHERE main.user_id = %s


gives me exactly the same results of this one longer:

SELECT main.num_msgs, main.num_grps, main.rnk
FROM (
SELECT
user_id,
num_grps,
num_msgs,
RANK() OVER(ORDER BY num_msgs DESC, num_grps DESC, user_id DESC) rnk
FROM (
SELECT
user_id,
COUNT(distinct group_id) AS num_grps,
COUNT(*) AS num_msgs
FROM messages
WHERE message_date > date_trunc('week', now())
GROUP BY user_id
) AS sub
) AS main
WHERE main.user_id = %s


The second one is longer because it uses another sub-query and in this way it can avoid to repeat the aggregate functions in
PARTITION BY
using aliases, while the first one is shorter and does not have the sub-query but repeats the aggregate functions because it can't uses aliases.

Which way is to prefer?

Answer Source

It seems that both queries are equivalent, except of the lack of DESC:

RANK() OVER(ORDER BY COUNT(*) DESC, ...

in the first one. This is probably an oversight.

The Postgres optimizer should generate the same plan for both queries, so use the shorter one. In case you are in doubt execute EXPLAIN ANALYSE for the queries and compare generated plans.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download