For a table consisting of three columns (A, B, C), I want to do the following. For every distinct value of column A, I want to find the value of B, for which I have the most distinct values of C. So, I need to group first by A, then by B and then count the distinct values of C for those groups and find the maximum. However, I couldn't figure out the correct combination of group by, max, count distinct etc. to achieve this.
This is a pain in MySQL because it doesn't support window functions or CTEs. Perhaps the easiest method is to use double aggregation and a string trick:
select a, substring_index(group_concat(b order by cnt desc), ',', 1) as MostCommon from (select a, b, count(distinct c) as cnt from t group by a, b ) ab group by a;
Note: For large data, you need to be concerned about the length of the intermediate string. The default length can be expanded.
Another solution is to use variables:
select a, b from (select a, b, cnt, (@rn := if(@a = a, @rn + 1, if(@a := a, 1, 1) ) ) rn from (select a, b, count(distinct c) as cnt from t group by a, b ) ab cross join (select @a := '', @rn := 0) params order by a, cnt desc ) t where rn = 1;