user1032531 user1032531 - 6 months ago 10
SQL Question

Is GROUP BY required with COUNT()?

The following query is intended to return the student name, her schools name, and her school districts name plus the number of schools in her district. Does

GROUP BY d.id
do anything? It appears that the
COUNT()
in the SELECT statement magically does the same thing. If not necessary, is it good practice?

SELECT s.name student_name, sc.name school_name, d.name district_name,
COUNT(sc.id) school_count_at_students_district
FROM students s
INNER JOIN schools sc ON sc.id=s.schools_id
LEFT OUTER JOIN schools sc2 ON sc2.districts_id=sc.districts_id
INNER JOIN districts d ON d.id=sc.districts_id
WHERE s.id=123;

Answer

Say you have the following table:

+----+-----+-----------+
| f1 | f2  |    f3     |
+----+-----+-----------+
|  1 | 4.3 | apple     |
|  2 | 1.4 | orange    |
|  2 | 3   | pear      |
|  3 | 1.2 | kiwi      |
|  3 | 2.2 | pineapple |
|  3 | 1   | pineapple |
+----+-----+-----------+

And you execute:

SELECT f1, sum(f2), f3 FROM table GROUP BY f1, f3;

We include the f1 and f3 columns in the GROUP BY to tell mysql how to aggregate. You'll get back:

+----+-----+-----------+
| f1 | f2  |    f3     |
+----+-----+-----------+
|  1 | 4.3 | apple     |
|  2 | 1.4 | orange    |
|  2 | 3   | pear      |
|  3 | 1.2 | kiwi      |
|  3 | 3.3 | pineapple |
+----+-----+-----------+

Here we get distinct groupings of f1 and f3 records, like we expect.

If you omit either f1 or f3 in the GROUP BY every RDBMS on the planet is going to throw an error, except MySQL. Non-MySQL DBs require you to be explicit about how you are summarizing your fields and they either must be grouped up in the GROUP BY or aggregated with a formula in the SELECT.

If, in MySQL, you omit the GROUP BY:

SELECT f1, sum(f2), f3 FROM table;

you will get records back like:

+----+-----+-----------+
| f1 | f2  |    f3     |
+----+-----+-----------+
|  1 | 4.3 | apple     |
|  2 | 4.4 | orange    |    
|  3 | 4.5 | kiwi      |    
+----+-----+-----------+

Which might be nonsense. The referential integrity of your records has been compromised. Oranges are not 4.4 and Kiwi is not 4.5. What MySQL has done is grabbed the first value it came across in the table and spit it out. This behavior might make sense if:

  1. You don't care much about which value MySQL picks from f3 or...
  2. You are relying on the storage engine of MySQL to maintain the order of your records, which consequently, makes you a fool. Don't do that.

Your best bet for code portability and unexpected results is to include a GROUP BY if you are aggregating with a function in your SELECT. Otherwise you'll get what you deserve.

Comments