Edamame Edamame - 2 months ago 16
SQL Question

Impala: change the column type prior to perform the aggregation function for group by

I have a table, my_table:

transaction_id | money | team
--------------------------------------------
1 | 10 | A
2 | 20 | B
3 | null | A
4 | 30 | A
5 | 16 | B
6 | 12 | B


When I group by team, I can compute max, min through query:

select team, max(money), min(money) from my_table group by team


However, I can't do avg and sum because there is null. i.e:

select team, avg(money), sum(money) from my_table group by team


would fail.

Is there a way to change the column type prior to computing the avg and sum? i.e. I want the output to be:

team | avg(money) | sum(money)
--------------------------------------
A | 20 | 40
B | 16 | 48


Thanks!

Answer

Per documentation provided by Cloudera your query should be working as-is. Both AVG Function and SUM Function ignore null.

SELECT team, AVG(money), SUM(money)
FROM my_table
GROUP BY team

UPDATE: Per your comment, again I'm not familiar with Impala. Presumably standard SQL will work. Your error appears to be a datatype issue.

SELECT team, AVG(CAST(money AS INT)), SUM(CAST(money AS INT))
FROM my_table
GROUP BY team