R Question

Using regex to extract email address after @ in dplyr pipe and then groupby to count occurrences

I have dataframe which has column called email. I want to find email addresses after @ symbol and then group by e.g (gmail,yahoo,hotmail) and count the occurrences of the same.


Now I can extract emails after @ using below code

sub(".*@", "", df$registrant_email)

How can I use it in dplyr pipe and then count occurrences of each email address

Answer Source

By first splitting into a character matrix, after coercing to data.frame, we can use common dplyr idioms


str_split_fixed(df$registrant_email, pattern = "@", n =2) %>%
  data.frame %>% group_by(X2) %>% count(X1)

The result is as follows

                   X2          X1     n
               <fctr>      <fctr> <int>
1   salesdesk     2
2  123ajumohan     1
3         123     1
4     chamukan     1
5  tmrsons1974     1

If you want to set variable names for better code comprehension, you can use

str_split_fixed(df$registrant_email, pattern = "@", n =2) %>%
  data.frame %>% setNames(c("local", "domain")) %>% 
  group_by(domain) %>% count(local)
