user5495762 user5495762 - 3 months ago 11
MySQL Question

How to get average per group and figure out outliers in SQL

This is what my data looks like:

id | value | group
------------------
1 | 4 | abc
2 | 8 | def
3 | 100 | abc
4 | 8 | ghi
5 | 7 | abc
6 | 10 | ghi


I need to figure out the averages per group where outliers (for e.g. id = 3 for group = abc) are excluded. Then display the ouliers next to averages. For above data I am expecting something like this as result:

group = 'abc'
average = '5.5'
outlier = '100'

Answer

One method creates a subquery containing the stats for each group (mean and standard deviation), and then joins this back to the original table to determine which records are outliers, for which group.

SELECT t1.id,
       t1.group AS `group`,
       t2.valAvg AS average,
       t1.value AS outlier
FROM yourTable t1
INNER JOIN
(
    SELECT `group`, AVG(value) AS valAvg, STDDEV(value) AS valStd
    FROM yourTable
    GROUP BY `group`
) t2
    ON t1.group = t2.group
WHERE ABS(t1.value - t2.valAvg) > t2.valStd   -- any record whose value is MORE
                                              -- than one standard deviation from
                                              -- the mean is an outlier

Update:

It appears that, for some reason, your value column is actual varchar rather than a numeric type. This means you won't be able to do any math on it. So first, convert that column to integer via:

ALTER TABLE yourTable MODIFY value INTEGER;

If you only want outliers which are greater than the average then use the following WHERE clause:

WHERE t1.value - t2.valAvg > t2.valStd