PPPP PPPP - 1 year ago 55
Apache Configuration Question

selecting incremental data from multiple tables in Hive

I have five tables(A,B,C,D,E) in Hive database and I have to union the data from these tables based on logic over column "id".

The condition is :

Select * from A
UNION
select * from B (except ids not in A)
UNION
select * from C (except ids not in A and B)
UNION
select * from D(except ids not in A,B and C)
UNION
select * from E(except ids not in A,B,C and D)


Have to insert this data into final table.

One way is to create a the target table (target)and append it with data for each UNION stage and then using this table for joining with the other UNION stage.

This would be the part of my .hql file :

insert into target
(select * from A
UNION
select B.* from
A
RIGHT OUTER JOIN B
on A.id=B.id
where ISNULL(A.id));

INSERT INTO target
select C.* from
target
RIGHT outer JOIN C
ON target.id=C.id
where ISNULL(target.id);

INSERT INTO target
select D.* from
target
RIGHT OUTER JOIN D
ON target.id=D.id
where ISNULL(target.id);

INSERT INTO target
select E.* from
target
RIGHT OUTER JOIN E
ON target.id=E.id
where ISNULL(target.id);





Is there a better to make this happen ? I assume we anyway have to do the
multiple joins/lookups .I am looking forward for best approach to achieve this
in


1) Hive with Tez

2) Spark-sql

Many Thanks in advance

Answer Source

If id is unique within each table, then row_number can be used instead of rank.

select      *

from       (select      *
                       ,rank () over
                        (
                            partition by    id
                            order by        src
                        )                           as rnk

            from        (           
                                    select 1 as src,* from a
                        union all   select 2 as src,* from b
                        union all   select 3 as src,* from c
                        union all   select 4 as src,* from d
                        union all   select 5 as src,* from e
                        ) t
            ) t

where       rnk = 1
;
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download