Joel Joel - 5 months ago 15
MySQL Question

Cannot cumulatively sum `COUNT(*)`

The second section of this answer uses variables to create a cumulative sum of another column. I'm doing the same thing, except that I am using a

GROUP BY
statement, and summing
COUNT(*)
instead of a column. Here is my code to create a minimal table and insert values:

CREATE TABLE `test_group_cumulative` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`group_id` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

INSERT INTO `test_group_cumulative` (`id`, `group_id`)
VALUES
(1, 1),
(2, 2),
(3, 3);


And here is the code that is failing:

SELECT
`group_id`,
COUNT(*) AS `count`,
@count_cumulative := @count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT @count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;


Here is the result:

group_id count count_cumulative
1 1 1
2 1 1
3 1 1


As you can see,
count_cumulative
is NOT summing correctly. However, here's the weird part. If I replace the
COUNT(*)
in
count_cumulative
with it's value,
1
, the query works correctly.

@count_cumulative := @count_cumulative + 1 AS `count_cumulative`


Here is the correct result:

group_id count count_cumulative
1 1 1
2 1 2
3 1 3


Obviously, in my app, there will be more than one item in each group, so
COUNT(*)
won't always be
1
. I know there are ways to do this with joins or subqueries, and I'll do that if I have to, but in my mind this SHOULD work. So why isn't
COUNT(*)
working inside of a cumulative sum?

Answer Source

This is a problem I often face when doing time series analysis. My preferred way to tackle this is to wrap it into a second select and introduce the counter in the last layer. And you can adapt this technique to more complicated data flows using temporary tables, if reqiured.

I did this small sqlfiddle using the schema you present: http://sqlfiddle.com/#!2/cc97e/21

And here is the query to get the cumulative count:

SELECT
tgc.group_id, @count_cumulative := @count_cumulative + cnt as cum_cnt
FROM (
  SELECT
    group_id, COUNT(*) AS cnt
  FROM `test_group_cumulative` 
  group by group_id
  order by id) AS `tgc`, 
(SELECT @count_cumulative := 0) AS `temp_var`; 

This is the result I get:

GROUP_ID    CUM_CNT
1           1
2           2
3           3

The reason your attempt did not work:

When you do a group by with the temporary variable, mysql executes individual groups independently, and at the time each group is assigned the temporary variable current value, which in this case is 0.

If, you ran this query:

SELECT @count_cumulative;

immediately after

SELECT
    `group_id`,
    COUNT(*) AS `count`,
    @count_cumulative := @count_cumulative + COUNT(*) AS `count_cumulative`
FROM `test_group_cumulative` AS `tgc`
JOIN (SELECT @count_cumulative := 0) AS `_count_cumulative`
GROUP BY `group_id`
ORDER BY `id`;

you would get the value 1. For each of your groups, the @count_cumulative is being reset to 0.

Hence, in my proposed solution, I circumvent this issue by generating the 'group-counts' first and then doing the accumulation.