user181218 user181218 - 1 month ago 9
Bash Question

Script for filtering and extracting Hive table names

I am trying to write a script that goes over all the databases in a hive server, and for each database returns the list of tables who contain a column with a certain name. Concretely (in pseudo-code):

list l
for d in show databases:
use d
for tbl in show tables:
res = describe tbl | grep col_name
if res not empty:
l.append(tbl.name)
return l


I am not sure about how to code this. Any help? If there are some good references for combining these shell commands with pipes etc. I'd appreciate the recommendation.

Answer

One option is to use hive -e '<hive command>' (regardless of your choice of scripting language):

hive -e 'show databases' will return all databases

hive -e 'use $d; show tables' will return all tables in database $d

hive -e 'use $d; describe $tbl' will describe table $tbl in database $d