Hunle Hunle - 3 months ago 8
SQL Question

How to use and "in" clause in "having" in HIVE?

I have my data in sometable like this:

col1 col2 col3
A B 3
A B 1
A B 2
C B 1


And I want to get all of the unique groups of
col1
and
col2
that contain certain rows of col3. Like, all groups of col1 and col2 that contain a "2".

I wanted to do something like this:

select col1, col2 from sometable
group by col1, col2
having col3=1 and col3=2


But I want it to only return groups that have an instance of both 1 and 2 in col3. so, the result after the query should return this:

col1 col2
A B


How do I express this in HIVE? THANK YOU.

Answer

I don't know why others deleted answers that where correct and then almost correct but I will put their's back up.

SELECT col1, col2, COUNT(DISTINCT col3)
FROM
    sometable
WHERE
    col3 IN (1,2)
GROUP BY col1, col2
HAVING
    COUNT(DISTINCT col3) > 1

If you actually want to return all of the records that meet your criteria you need to do a sub select and join back to the main table to get them.

SELECT s.*
FROM
    sometable s
    INNER JOIN (
       SELECT col1, col2, COUNT(DISTINCT col3)
       FROM
          sometable
       WHERE
          col3 IN (1,2)
       GROUP BY col1, col2
       HAVING
          COUNT(DISTINCT col3) > 1
    ) t
    ON s.Col1 = t.Col1
    AND s.Col2 = t.Col2
    AND s.col3 IN (1,2)

The gist of this is narrow/filter your rowset to the rows that you want to test col3 IN (1,2) then count the DISTINCT values of col3 to make sure both 1 and 2 exist and not just 1 & 1 or 2 & 2.