analina analina - 3 months ago 5x
SQL Question

SQL: query maximum number of distinct values grouping by multiple columns

For a table consisting of three columns (A, B, C), I want to do the following. For every distinct value of column A, I want to find the value of B, for which I have the most distinct values of C. So, I need to group first by A, then by B and then count the distinct values of C for those groups and find the maximum. However, I couldn't figure out the correct combination of group by, max, count distinct etc. to achieve this.


This is a pain in MySQL because it doesn't support window functions or CTEs. Perhaps the easiest method is to use double aggregation and a string trick:

select a,
       substring_index(group_concat(b order by cnt desc), ',', 1) as MostCommon
from (select a, b, count(distinct c) as cnt
      from t
      group by a, b
     ) ab
group by a;

Note: For large data, you need to be concerned about the length of the intermediate string. The default length can be expanded.

Another solution is to use variables:

select a, b
from (select a, b, cnt,
             (@rn := if(@a = a, @rn + 1,
                        if(@a := a, 1, 1)
             ) rn
      from (select a, b, count(distinct c) as cnt
            from t
            group by a, b
           ) ab cross join
           (select @a := '', @rn := 0) params
      order by a, cnt desc
     ) t
where rn = 1;