Algorath Algorath - 1 year ago 115
SQL Question

How does ORA-00935 trigger? Seeking insight on Oracle SQL's nested group function methodology

I'm using SQL*Plus, or Oracle 11g express, if that's relevant.

I came across something curious that I wish to learn more about regarding nested group functions, as I am new to SQL I found the irony of the error and its solution curious:

This does not work

t1.col1 =
(select col1
from t2
having count(col2) = max(count(col2))
group by col1

With the above, I receive:

ORA-00935: Group Function is Nested too deeply

This does work

t1.col1 =
(select col1
from t2
having count(col2) =
(select max(count(col2)) from t2
group by col1)
group by t2.col1)

Given the above I have two questions:

1) Exactly how/where does the compiler/application become confused?

2) If only for clarity's sake, am I right in assuming the deeper sub-query 'fires' first, and returns a value for the upper sub-query, ergo skipping the mess entirely?

  1. The compiler is not confused.

    The error is simply alerting you to the fact that the aggregate functions don't make sense in that context:

    having count(col2) = max(count(col2)

    The HAVING clause takes effect after the GROUP BY, so COUNT(COL2) is the count of non-null values in the COL2 column for each distinct value of col1. In the context of one distinct value of col1, MAX(COUNT(COL2)) makes no sense - if the result of COUNT(COL2) is 3, then MAX(3), obviously, is 3 - but here the compiler is smart enough to know that you probably didn't intend it that way.

    So, what you were intending is that the COUNT(COL2) on the left-hand side was supposed to be for that distinct value of COL1), but the COUNT(COL2) on the right-hand side was supposed to be over all values of COL1. The expression as a whole, therefore, is a mixture and is not valid SQL.

  2. Yes, you can think of it that way if you like. The deepest subquery is run first, which returns a result set (in this case, a single row with a single column) to the calling subquery.

You may find the following query is more efficient, instead of running effectively two queries against the table:

where t1.col1 =
  (select col1
   from   (select col1, count_col2, MAX(count_col2) OVER () max_count_col2
           from   (select col1,
                          count(col2) AS count_col2
                   from t2
                   GROUP BY col1))
   where  count_col2 = max_count_col2)