hazyred hazyred - 3 months ago 55
MySQL Question

MySQL 5.7 & only_full_group_by

So, I updated my ubuntu box and mysql came along for the ride to the glorious 5.7 that brings about doom:

[:error] [pid 9211] [client 0.0.0.0:1] PHP Fatal error:
Uncaught PDOException: SQLSTATE[42000]: Syntax error or access violation:
1140 In aggregated query without GROUP BY, expression #1 of SELECT list contains
nonaggregated column 'a.Z'; this is incompatible with sql_mode=only_full_group_by


The line it's referring to is:

$s = $d->prepare("SELECT a.Z,a.Y,a.X,MAX(b.F) as `G` FROM `a`,`b` WHERE (a.P=:va1 && b.R:va2)");


Reject queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on (uniquely determined by) GROUP BY columns.

source: http://dev.mysql.com/doc/refman/5.7/en/sql-mode.html#sqlmode_only_full_group_by

So, here are my 4 questions:

1- nonaggregated columns being
a.Z
,
a.Y
,
a.X
?

2- functionally dependent on = ?

3- uniquely determined by. Literarily, columns that are (for example)
P_Id int NOT NULL UNIQUE
?

4- Even though this error I can fix with adding
any_value()
around
a.Z
,
a.Y
and
a.X
I still don't understand why I'm doing so.
I imagine that the alternative would be to do a group by
a.Z
or/and
a.Y
or/and
a.X
. But which of those 3 do I pick in the group by and why?

Also, I do not want to disable
only_full_group_by
. I would like to understand how to properly fix this current issue and most likely the astronomic number that will follow.

Answer
  1. Yes. Non-aggregated columns are any column that don't use an aggregation function like MAX, COUNT, SUM, GROUP_CONCAT, etc.

  2. Column a is functionally dependent on column b if the value of b implies a particular value of a. This generally means that b is a unique key for the table, and a is some other column in that table.

  3. Uniquely determined by is the same as functional dependency.

  4. The alternative would be to list all the non-aggregated columns in the GROUP BY list: GROUP BY a.Z, a.Y, a.X.

The reason for all this is that when you select columns that aren't in the GROUP BY list, they will come from arbitrary rows in the grouped rows. This leads to many common errors. For instance, a common mistake is to write:

SELECT user_id, MAX(timestamp), ip_address
FROM user_logins
GROUP BY user_id

and expect ip_address to contain the address of the most recent login for each user. But it will actually contain any of the addresses they logged in from, not the one from the row with MAX(timestamp). See SQL Select only rows with Max Value on a Column for the correct way to do that.

The functional dependency exception is typically useful with joins.

SELECT u.user_id, u.user_name, MAX(l.timestamp)
FROM users AS u
JOIN user_logins AS l ON u.user_id = l.user_id
GROUP BY u.user_id

Since user_id is the primary key of the users table, it uniquely determines user_name, so it's not necessary to list that explicitly in GROUP BY.