muni muni - 2 months ago 6
SQL Question

Hive Query is not working as expected

I am trying a left join in Hive Query, but it does not seem to work. It returns me columns only from left table:

create table mb.spt_new_var as select distinct customer_id ,target from mb.spt_201603 A
left outer join mb.temp B
on (A.customer_id=B.cust_id);


I tried selecting few records from table B based on the some random customer_id from table A and it returns some records. But if I try the left join on table A, it returns me only columns from table A. The data-type of both the IDs is same(int). what could be the possible reason behind this?

Sample Table A:

Customer_account_id target
12356 1
34245 0
12356 1
.... ..


Sample Table B:

Cust_id col1 col2 col3
12356 ..
12567 ..
24426 ..
...


Table A has some 1m records, while table B has some 30m records. There is possibility of some duplicate IDs in table A and Table B.

Answer

I'm a bit confused. Hive is returning the columns that you specify in the query:

select distinct a.customer_id, a.target 
from mb.spt_201603 a left outer join
     mb.temp b
     on a.customer_id = b.cust_id;

If you want columns from the second table, you need to select them:

select distinct a.customer_id, a.target, b.col1, b.col2
from mb.spt_201603 a left outer join
     mb.temp b
     on a.customer_id = b.cust_id;