Algorath Algorath - 6 months ago 10
SQL Question

How does ORA-00935 trigger? Seeking insight on Oracle SQL's nested group function methodology

I'm using SQL*Plus, or Oracle 11g express, if that's relevant.

I came across something curious that I wish to learn more about regarding nested group functions, as I am new to SQL I found the irony of the error and its solution curious:

This does not work

where
t1.col1 =
(select col1
from t2
having count(col2) = max(count(col2))
group by col1
;


With the above, I receive:


ORA-00935: Group Function is Nested too deeply


This does work

where
t1.col1 =
(select col1
from t2
having count(col2) =
(select max(count(col2)) from t2
group by col1)
group by t2.col1)
;


Given the above I have two questions:

1) Exactly how/where does the compiler/application become confused?

2) If only for clarity's sake, am I right in assuming the deeper sub-query 'fires' first, and returns a value for the upper sub-query, ergo skipping the mess entirely?

Answer
  1. The compiler is not confused.

    The error is simply alerting you to the fact that the aggregate functions don't make sense in that context:

    having count(col2) = max(count(col2)
    

    The HAVING clause takes effect after the GROUP BY, so COUNT(COL2) is the count of non-null values in the COL2 column for each distinct value of col1. In the context of one distinct value of col1, MAX(COUNT(COL2)) makes no sense - if the result of COUNT(COL2) is 3, then MAX(3), obviously, is 3 - but here the compiler is smart enough to know that you probably didn't intend it that way.

    So, what you were intending is that the COUNT(COL2) on the left-hand side was supposed to be for that distinct value of COL1), but the COUNT(COL2) on the right-hand side was supposed to be over all values of COL1. The expression as a whole, therefore, is a mixture and is not valid SQL.

  2. Yes, you can think of it that way if you like. The deepest subquery is run first, which returns a result set (in this case, a single row with a single column) to the calling subquery.

You may find the following query is more efficient, instead of running effectively two queries against the table:

where t1.col1 =
  (select col1
   from   (select col1, count_col2, MAX(count_col2) OVER () max_count_col2
           from   (select col1,
                          count(col2) AS count_col2
                   from t2
                   GROUP BY col1))
   where  count_col2 = max_count_col2)