roger roger -4 years ago 165
SQL Question

How can I get group data and collect the result into a map in Hive?

I have a table like this:

id | job | school |
1 | programmer | school1 |
2 | programmer | school1 |
3 | programmer | school2 |
4 | pm | school3 |
5 | pm | school2 |
6 | pm | school3 |


I want to do the following:


  1. Group by job

  2. Get school list and count, like this[(school1, 2), (school2, 1)]

  3. School list order by count, so can NOT be [(school1, 1), (school1, 2)]



The result of the example is:

programmer | [(school1, 2), (school2, 1)]
pm | [(school3, 2), (school2, 1)]

Answer Source

Just add the Brickhouse jar and create a collect() function

add jar ./brickhouse-0.7.1.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';

select job
  , collect(school, c) school_count_map
from (
  select *
  from (
    select job, school
      , count( * ) c
    from table
    group by job, school ) x
  order by job, c desc) y
group by job
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download