user5495762 - 9 months ago 38

MySQL Question

This is what my data looks like:

`id | value | group`

------------------

1 | 4 | abc

2 | 8 | def

3 | 100 | abc

4 | 8 | ghi

5 | 7 | abc

6 | 10 | ghi

I need to figure out the averages per group where outliers (for e.g. id = 3 for group = abc) are excluded. Then display the ouliers next to averages. For above data I am expecting something like this as result:

`group = 'abc'`

average = '5.5'

outlier = '100'

Answer

One method creates a subquery containing the stats for each group (mean and standard deviation), and then joins this back to the original table to determine which records are outliers, for which group.

```
SELECT t1.id,
t1.group AS `group`,
t2.valAvg AS average,
t1.value AS outlier
FROM yourTable t1
INNER JOIN
(
SELECT `group`, AVG(value) AS valAvg, STDDEV(value) AS valStd
FROM yourTable
GROUP BY `group`
) t2
ON t1.group = t2.group
WHERE ABS(t1.value - t2.valAvg) > t2.valStd -- any record whose value is MORE
-- than one standard deviation from
-- the mean is an outlier
```

**Update:**

It appears that, for some reason, your `value`

column is actual `varchar`

rather than a numeric type. This means you won't be able to do any math on it. So first, convert that column to integer via:

```
ALTER TABLE yourTable MODIFY value INTEGER;
```

If you only want outliers which are *greater* than the average then use the following `WHERE`

clause:

```
WHERE t1.value - t2.valAvg > t2.valStd
```