rimsky rimsky - 11 months ago 51
SQL Question

Is HAVING ever necessary for non-aggregated grouped columns?

I have some code that generates SQL and need to understand if HAVING is ever necessary (or useful) for non-aggregated grouped columns? I haven't found any examples that suggest it is but wanted to check here.

The MySQL docs has this comment "The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions."

I know that HAVING is necessary for aggregated conditions on groups, and also understand that WHERE can be used for non-aggregated grouped columns (which can be more efficient than having), but my questions is this:

Is HAVING ever necessary (or useful) for non-aggregated grouped columns?


Answer Source

HAVING is specifically designed for aggregated columns. MySQL allows non-aggregated columns in the HAVING clause. There are three use-cases, that I can think of:

  1. An efficiency hack, when the values of the group having identical values for the group.
  2. An error, which should be avoided.
  3. An efficiency hack, when there is no aggregation.

The first could conceivably be used in a situation like this:

select l.*, sum(x.y)
from list l join 
     . . .
group by l.listid
having l.foo = 'bar';

This works because all l.foo should have the same value for a given l.listid (assuming l.listid is a primary key). In this case, this filters the data as if you used where.

BUT, if this condition is not true, then the HAVING/WHERE equivalence is not true. The HAVING will choose a value from an indeterminate row and then filter the resulting aggregation column. The WHERE does the filtering before the aggregation. So, if lists could have the same type and you do:

select l.*, sum(x.y)
from list l join 
     . . .
group by l.type
having l.foo = 'bar';

This is a badly formed query (hence an error in my opinion), but is not equivalent to moving the condition to the WHERE.

The third situation is where there is no aggregation:

select l.*, concat('a', 'b', 'c') as test
from list l
having test = 'abc';

This is a convenience in MySQL. Other dialects would use a subquery. MySQL materializes subqueries, introducing inefficiency.