dinesh707 dinesh707 - 6 months ago 10
SQL Question

How to add aggregation function to non grouped column which is not in select

I have a table

tab
, which contains columns
a,b,c,d
. But the following query will not work since the
c
is not in the group by clause or in a reduction function.

SELECT a, b, c FROM tab GROUP BY a, b;


But what i want is to select
c
based on maximum value of
d
. How can I do this query in PostgreSQL ?.

| a | b | c | d |
| 1 | 2 | 3 | 100 |
| 1 | 2 | 4 | 110 |
| 1 | 2 | 5 | 90 |


As the output I need the result in row 2, because the value in d is the highest.

Answer

Classic top-n-per-group. One way to do it using ROW_NUMBER:

WITH
CTE
AS
(
    SELECT
        a, b, c
        ,ROW_NUMBER() OVER(PARTITION BY a, b ORDER by d DESC) AS rn
    FROM tab
)
SELECT
    a, b, c
FROM CTE
WHERE rn = 1;

Index on (a, b, d, c) should help.

Approach with ROW_NUMBER works well when a table has few rows per group and the server has to read most of the table. For example, a table has 1 million rows and 800K distinct groups of (a, b). You'd have to read most rows any way.

If the table has 1 million rows and only 20 distinct groups of (a, b) it would be better to do 20 seeks of an appropriate index instead of reading all rows.