davr - 10 months ago 54

SQL Question

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used

`AVG(x)`

Example data:

`id | val`

--------

1 4

2 7

3 2

4 2

5 9

6 8

7 3

Sorting on

`val`

`2 2 3 4 7 8 9`

`4`

`SELECT AVG(val)`

`5`

Answer

The problem with the proposed solution (TheJacobTaylor) is runtime. Joining the table to itself is slow as molasses for large datasets. My proposed alternative runs in mysql, has awesome runtime, uses an explicit ORDER BY statement, so you don't have to hope your indexes ordered it properly to give a correct result, and is easy to unroll the query to debug.

```
SELECT avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 as `row_number`, d.val
FROM data d, (SELECT @rownum:=0) r
WHERE 1
-- put some where clause here
ORDER BY d.val
) as t1,
(
SELECT count(*) as total_rows
FROM data d
WHERE 1
-- put same where clause here
) as t2
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) );
```

[edit] Added avg() around t1.val and row_number in(...) to correctly produce a median when there are an even number of records. Reasoning:

```
SELECT floor((3+1)/2),floor((3+2)/2);#total_rows is 3, so avg row_numbers 2 and 2
SELECT floor((4+1)/2),floor((4+2)/2);#total_rows is 4, so avg row_numbers 2 and 3
```