Paul Benn Paul Benn - 28 days ago 12
SQL Question

SQL aggregate function alias

I'm a beginner at SQL and this is the question I have been asked to solve:


Say that a big city is defined as a
place
of type
city
with a population of at
least 100,000. Write an SQL query that returns the scheme
(state_name,no_big_city,big_city_population)
ordered by
state_name
, listing those states which have either (a) at least five big cities or (b) at least one million people living in big cities. The column
state_name
is the
name
of the
state
,
no_big_city
is the number of big cities in the state, and
big_city_population
is the number of people living in big cities in the state.


Now, as far as I can see, the following query returns correct results:

SELECT state.name AS state_name
, COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) AS no_big_city
, SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) AS big_city_population
FROM state
JOIN place
ON state.code = place.state_code
GROUP BY state_name
HAVING
COUNT(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN 1 ELSE NULL END) >= 5 OR
SUM(CASE WHEN place.type = 'city' AND place.population >= 100000 THEN place.population ELSE NULL END) >= 1000000
ORDER BY state_name;


However, the two aggregate functions used in the code appear twice. MY question: is there any way of making this code duplication disappear preserving functionality?

To be clear, I have already tried using the alias, but I just get a "column does not exist" error. Any help is appreciated.

Answer

The manual clarifies:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

Bold emphasis mine.

You can avoid typing long expressions repeatedly with a subquery or CTE:

SELECT state_name, no_big_city, big_city_population
FROM  (
   SELECT s.name AS state_name
        , COUNT(*)        FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS no_big_city
        , SUM(population) FILTER (WHERE p.type = 'city' AND p.population >= 100000) AS big_city_population
   FROM   state s
   JOIN   place p ON s.code = p.state_code
   GROUP  BY s.name -- can be input column name as well, best schema-qualified to avoid ambiguity
   ) sub
WHERE  no_big_city >= 5
   OR  big_city_population >= 1000000
ORDER  BY state_name;

While being at it, I simplified with the aggregate FILTER clause (Postgres 9.4+):