Lina P Lina P - 25 days ago 6
SQL Question

Invalid column reference for Rank function in SQL when using SUM

SELECT
*
FROM(
SELECT
imps.org_name,
imps.org_id,
imps.adv_name,
imps.adv_id,
imps.mc,
Rank() over (partition by imps.org_id ORDER by imps.mc desc) as Rank
FROM(
SELECT
org_name,
org_id,
adv_name,
adv_id,
sum(cost/1000) as mc,
FROM
table1
WHERE
org_id in (12345, 54321)
AND
date
BETWEEN
'2016-09-10'
AND
'2016-11-01'
GROUP BY
adv_id,
org_name,
org_id,
adv_name) imps
GROUP BY
imps.org_name,
imps.org_id,
imps.adv_name,
imps.adv_id) r
WHERE r.Rank <= 5;


When running this query I get the error

FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies.
Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 10:65 Invalid column reference 'mc'


Since that column is clearly defined not sure why it throws the error.
I've tried sum(imps.mc) and that seems to work but I am uncomfortable putting the sum inside the rank function since it doesn't seem efficient.

Overall question: is there a better way of doing this ranking?

Answer

This is what I want to try: (as a comment this is wouldn't be possible to easily post: not sure if it will have any impact, just eliminating a potential error (though I doubt it is here)

  • made imps a cte
  • Removed group by on R. window functions to my knowledge don't need a group by.
  • removed comma after mc on innermost query when I made it a cte.

.

With imps as (
        SELECT 
            org_name,
            org_id,
            adv_name,
            adv_id,
            sum(cost/1000) as mc 
        FROM
            table1
        WHERE
            org_id in (12345, 54321)
        AND 
            date 
        BETWEEN 
            '2016-09-10'
        AND 
            '2016-11-01'
        GROUP BY
            adv_id, 
            org_name, 
            org_id, 
            adv_name)
SELECT
*
FROM(
    SELECT 
        imps.org_name, 
        imps.org_id,
        imps.adv_name,
        imps.adv_id,
        imps.mc,
        Rank() over (partition by imps.org_id ORDER by imps.mc desc) as Rank
    FROM IMPS) r 
WHERE r.Rank <= 5;