Trapezera Buscando Trapezera Buscando - 23 days ago 9
MySQL Question

SQL - Most frequent value in column of joined tables

I have three tables described below:

Area (Id, Description)

City(Id, Name)

Problem(Id, City, Area, Definition):
City references City (Id), Area references Area (Id)


I want to find the most frequent value of Area(Description) that appears in Problem for each City (Name).

Example:

Area
Id Description
1 Support
2 Finance

City
Id Name
1 Chicago
2 Boston

Problem
Id City Area Definition
1 1 2 A
2 1 2 B
3 1 1 C
4 2 1 D


Desired Output:

Name Description
Chicago Finance
Boston Support


Here's what I have tried with no success :

SELECT Name,
Description
FROM
(SELECT *
FROM Problem AS P,
City AS C,
Area AS A
WHERE C.Id = P.City
AND A.Id = P.Area ) AS T1
WHERE Description =
(SELECT Description
FROM
(SELECT *
FROM Problem AS P,
City AS C,
Area AS A
WHERE C.Id = P.City
AND A.Id = P.Area ) AS T2
WHERE T1.Name = T2.Name
GROUP BY Description
ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
Description


Thanks!

Answer

This is probably the shortest way to solve your issue:

select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
    select p.Area
    from Problem p
    where p.City = c.Id
    group by p.Area
    order by count(*) desc, p.Area asc
    limit 1
)

We use a CROSS JOIN to combine every City with every Area. But we pick only the Area with the highest count in the Problem table for the given city, which is determined in the correlated subquery. If two areas have the same highest count for a city, the one coming first alphabetically will be picked (order by ... p.Area asc).

Here's another more complex solution which includes the count.

select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
    select City, max(c) as mc
    from (
        select p.City, p.Area, count(*) as c
        from problem p
        group by p.City, p.Area
    ) city_area_count
    group by City
) city_area_maxcount
join (
    select p.City, p.Area, count(*) as c
    from problem p
    group by p.City, p.Area
) city_area_count
    on  city_area_count.City = city_area_maxcount.City
    and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area

The subquery alisaed as city_area_maxcount is used twice here (i hope mysql can cache the result). If you think of it as a table, that would be a common find-the-row-with-top-value-per-group problem. If two areas have the same highest count for a city, both will be selected.