dedpo dedpo - 7 months ago 17
SQL Question

syntax error in Hive Query

enter image description here

I am trying to answer this question

Of the right-handed batters who were born in October and died in
2011, which one had the most hits in his career?


My attempt to get the query, Please ignore the total, it supposed to for sums for b.hits, dont know how to alias it.

SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;


Incase anyone needs the schema of the two tables used, look below.

INSERT OVERWRITE DIRECTORY '/home/hduser/hivetest/answer4'
SELECT n.id, n.bmonth, n.dyear,n.bats, SUM(b.hits) FROM master n
JOIN (SELECT b.id , b.hits FROM batting GROUP BY id) o
WHERE n.bmonth == 10 AND n.dyear == 2011) x
ON x.id=n.id
ORDER BY total DESC;

Answer

First, although Hive accepts ==, that doesn't mean you should use it. The standard SQL equality operator is simply =. There is no reason to use a synonym.

I suspect the problem is several things:

  • The lack of group by.
  • Mis-use of aggregation functions.
  • Missing aliases
  • SQL query clauses in the correct order
  • Unbalanced parentheses

In other words, the query is just a mess. You need to review the basics of query syntax. Does this work?

SELECT m.id, m.bmonth, m.dyear, m.bats, b.hits as total
FROM master m JOIN
     (SELECT b.id, SUM(b.hits) as hits
      FROM batting b
      GROUP BY id
     ) b
     ON b.id = m.id 
WHERE m.bmonth = 10 AND m.dyear = 2011
ORDER BY total DESC;
Comments