bublitz bublitz - 20 days ago 6
SQL Question

SQL Group by within another group by

I have currently a problem where I need to find a count of groups within a group. However, I want to create on line for the encompassing group and concatenate the smaller group within a string.

The following table shows the situation I am currently facing:

----------------------------------------
Timestamp ID Member
----------------------------------------
1 1 A
1 1 B
1 2 A
1 2 B
1 2 C
2 1 A
2 2 A
2 2 A
2 2 C
----------------------------------------


Now I want to find for each timestamp, how many (distinct) members of each ID are in the table, i.e., the result should look like this:




Timestamp MemberIDCount
----------------------------------------
1 1:2,2:3
2 1:1,2:2
----------------------------------------


i.e., the format of the string is:

[ID]:[count(distinct(member)],...


I know, that you can solve this with two consecutive group bys (i.e., first on timestamp and ID to count members and the on timestamp for the string concatenation). However, I hope that there is a smarter solution, since I have to apply that to a large dataset and dont want to have to execute 2 group bys. I work with Cloudera Impala, but solutions in other SQL languages are also appreciated.

Thank you for your help.

Answer

You can do this using group by twice and group_concat(). I don't have Impala on hand, but something like this should work:

select timestamp,
       group_concat(concat_ws(':', member, cnt))
from (select timestamp, member, count(*) as cnt
      from t
      group by timestamp, member
     ) tm
group by timestamp
order by timestamp;