Bala Bala - 1 year ago 81
SQL Question

Extract domain_name alone from email_id in Hive

In my Hive table, i have an email address column : xxxx@gmail.com
i want to display "gmail" alone. I tried using regexp_extract but not getting the desired output. nowhere near.

Also for email ids like xxxx@ffff.nec.co.jp , i want the output to be "ffff.nec.co"

Below is my expression : select regexp_extract('xxxx@gmail.com','.([^.]+)')
but it does not bring the desired output.
Please help me out guys. Regexp or substr is fine for me.

Answer Source

Option 1

hive> select  regexp_extract('xxxx@gmail.com'       ,'@(.*)\\.',1);
OK
gmail

hive> select  regexp_extract('xxxx@ffff.nec.co.jp'  ,'@(.*)\\.',1);
OK
ffff.nec.co

Option 2

hive> select  regexp_extract('xxxx@gmail.com'       ,'(?<=@).*(?=\\.)',0);
OK
gmail

hive> select  regexp_extract('xxxx@ffff.nec.co.jp'  ,'(?<=@).*(?=\\.)',0);
OK
ffff.nec.co
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download