davr davr - 5 months ago 28
SQL Question

Simple way to calculate median with MySQL

What's the simplest (and hopefully not too slow) way to calculate the median with MySQL? I've used

AVG(x)
for finding the mean, but I'm having a hard time finding a simple way of calculating the median. For now, I'm returning all the rows to PHP, doing a sort, and then picking the middle row, but surely there must be some simple way of doing it in a single MySQL query.

Example data:

id | val
--------
1 4
2 7
3 2
4 2
5 9
6 8
7 3


Sorting on
val
gives
2 2 3 4 7 8 9
, so the median should be
4
, versus
SELECT AVG(val)
which ==
5
.

Answer

The problem with the proposed solution (TheJacobTaylor) is runtime. Joining the table to itself is slow as molasses for large datasets. My proposed alternative runs in mysql, has awesome runtime, uses an explicit ORDER BY statement, so you don't have to hope your indexes ordered it properly to give a correct result, and is easy to unroll the query to debug.

SELECT avg(t1.val) as median_val FROM (
SELECT @rownum:=@rownum+1 as `row_number`, d.val
  FROM data d,  (SELECT @rownum:=0) r
  WHERE 1
  -- put some where clause here
  ORDER BY d.val
) as t1, 
(
  SELECT count(*) as total_rows
  FROM data d
  WHERE 1
  -- put same where clause here
) as t2
WHERE 1
AND t1.row_number in ( floor((total_rows+1)/2), floor((total_rows+2)/2) );

[edit] Added avg() around t1.val and row_number in(...) to correctly produce a median when there are an even number of records. Reasoning:

SELECT floor((3+1)/2),floor((3+2)/2);#total_rows is 3, so avg row_numbers 2 and 2
SELECT floor((4+1)/2),floor((4+2)/2);#total_rows is 4, so avg row_numbers 2 and 3