user2441441 user2441441 - 7 months ago 37
SQL Question

SQL/Hive count distinct column

How do I do this in Hive?

columnA columnB columnC
100.10 50.60 30
100.10 50.60 30
100.10 50.60 20
100.10 70.80 40


Output should be:

columnA columnB No_of_distinct_colC
100.10 50.60 2
100.10 70.80 1


Query that I think is correct:

SELECT columnA,columnB,COUNT(distinct column C)
from table_name
group by columnA,columnB


Is this correct? SQL is fine too.

UPDATE: How do I find the standard deviation of columnC? Need this asap.

Answer

Yes, it is almost correct. But you have one simple mistake. Your column name is wrong inside COUNT.

SELECT columnA,columnB,COUNT(DISTINCT columnC) No_of_distinct_colC
from table_name
group by columnA,columnB
Comments