Koushik Chandra Koushik Chandra - 26 days ago 10
Bash Question

dropping hive partition based on certain condition in runtime

I have a table in hive like

create table t1 (x int, y int, s string) partitioned by (wk int) stored as sequencefile;

The table is having the below data

select * from t1;
+-------+-------+-------+--------+--+
| t1.x | t1.y | t1.s | t1.wk |
+-------+-------+-------+--------+--+
| 1 | 2 | abc | 10 |
| 4 | 5 | xyz | 11 |
| 7 | 8 | pqr | 12 |
+-------+-------+-------+--------+--+


Now the ask is to drop the oldest partition when partition count is
>=2

Can this be handled in hql or through any shell script and how?

Answer

If your partitions are ordered by date, you could write a shell script in which you could use hive -e 'SHOW PARTITIONS t1' to get all partitions, in your example, it will return:

wk=10
wk=11
wk=12

Then you can issue hive -e 'ALTER TABLE t1 DROP PARTITION (wk=10)' to remove the first partition;

So something like:

if (( `hive -e 'SHOW PARTITIONS t1' | grep wk | wc -l` < 2)) ; then
    exit;
fi
partition=`hive -e 'SHOW PARTITIONS t1' | grep wk | head -1`;
hive -e 'ALTER TABLE t1 DROP PARTITION ($partition)';
Comments