rajesh mallela rajesh mallela - 1 year ago 44
SQL Question

How to retrieve complete data from two tables based on user_name along with max date in Hive

For example I have a table A and B with the following data:


user_name date1 count1 count2
X 15 1 1
X 30 1 3
Y 04 1 3


user_name date1 count3 count4 status
X 15 11 1 Y
X 30 13 3 N
Y 04 16 3 NA

How to join these 2 tables for each feedname with max date.
I need the output like these:

username date1 count1 count4 status
X 30 1 3 N

like these way.
Can anyone plz help in these situation.

Answer Source

Since according to your comment every combination (user_name, date1) exists in both tables, you can use e.g.

select a.*, b.count3, b.count4, b.status 
from tableA as a
join tableB as b
on a.user_name = b.user_name and   
   a.date1 = b.date1 
where not exists 
   (select 1 from tableA as a1
    where a1.user_name = a.user_name
      and a1.date1 > a.date1);

You want to have an index on (user_name, date1) to speed it up.

As a side note: If every entry in tableA has exactly 1 entry in tableB and vice-versa (it's not clear from your description if that is the case, but it looks like it), and thus (user_name, date1) would be a primary key in both tables, you absolutely should add the columns count3, count4 and status to tableA and get rid of tableB. You can still use the above code (without join) to find only the max entry per user.