Shafizadeh Shafizadeh - 6 months ago 9
MySQL Question

Why should I write the rest of columns into GROUP BY when there is an aggregate function?

I have this table structure:

// mytable
+----+------+-------+-------------+
| id | type | score | unix_time |
+----+------+-------+-------------+
| 1 | 1 | 5 | 1463508841 |
| 2 | 1 | 10 | 1463508842 |
| 3 | 2 | 5 | 1463508843 |
| 4 | 1 | 5 | 1463508844 |
| 5 | 2 | 15 | 1463508845 |
| 6 | 1 | 10 | 1463508846 |
+----+------+-------+-------------+


And here is my query:

SELECT SUM(score), unix_time
FROM mytable
WHERE 1
GROUP BY type


And here is the output:

+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 20 | 1463508843 |
+-------+-------------+


Ok, all fine .. Just there is a thing: Professional people suggest me to write
unix_time
into
GROUP BY
. They believe doing that is the base of grouping and aggregate function.

Well why really should I write a (almost) unique column into
GROUP BY
? If I do that then each row will be a separated group and there will be a lot of extra rows which are useless:

+-------+-------------+
| score | unix_time |
+-------+-------------+
| 30 | 1463508841 |
| 30 | 1463508842 |
| 20 | 1463508843 |
| 30 | 1463508844 |
| 20 | 1463508845 |
| 30 | 1463508846 |
+-------+-------------+


See? There is a lot of extra rows. So why doing that is an standard thing? Why everybody tell me MySQL does work without doing that but no database else doesn't .. Well I really don't understand why should I do that ..!

May please someone make it clear for me and explain me how
GROUP BY
works exactly? Is that different than my understanding?

Answer

Not having unix_time in the GROUP BY clause is a non-standard MySQL hack that I would totally stay away from. The values for unix_type across all the rows with the same type are completely different. How do you know which unix_time should appear?

In your example, you seem perfectly content to use a completely arbitrary value of unix_time per group.

However this is a recipe for disaster. What does it even mean to pick some totally arbitrary value from a group? What if the unix_times were spread out by days or weeks or even years? Which one would you take then?

The reason the pros are telling you to put it in the group by clause is so that the result makes sense! Another approach is to leave unix_time out of the select completely, as the result you are getting shouldn't be relied upon.