Kalimantan Kalimantan - 1 year ago 51
Python Question

Keeping domain of Email but removing TLD

I am using python and I want to be able to keep the domain of the email but remove the 'com', or '.co.uk', or 'us', etc

So basically if I have an email, say random@gmail.com. I want to have only @gmail left in string format, but I want to do this for any email. So random@yahoo.com would leave me with @yahoo, or random@aol.uk, would leave me with @aol

so far I have:

domain = re.search("@[\w.]+", val)
domain = domain.group()

That returns the domain but with the TLD . So @gmail.com, or @aol.co

Answer Source

With pandas functions use split:

df = pd.DataFrame({'a':['random@yahoo.com','random@aol.uk','random@aol.co.uk']})

print (df)
0  random@yahoo.com
1     random@aol.uk
2  random@aol.co.uk

print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
0    @yahoo
1      @aol
2      @aol
Name: a, dtype: object

But faster is use apply, if in column are not NaN values:

df = pd.concat([df]*10000).reset_index(drop=True)

print ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
print (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))

In [363]: %timeit ('@' + df.a.str.split('@').str[1].str.split('.', 1).str[0] )
10 loops, best of 3: 79.1 ms per loop

In [364]: %timeit (df.a.apply(lambda x: '@' + x.split('@')[1].split('.')[0]))
10 loops, best of 3: 27.7 ms per loop