R Question

Creating a company ID column based on email address in R (dplyr answer preferred)

I haven't tackled this part yet, but I figured I'd throw it up here to see if I can get some hints before I attempt this tomorrow. I have a list of emails with varying addresses: john@ketchup.com, cindy@mustard.com, bob@relish.com. I'd like to create a column that assigns each new company name an ID. For example, if @ketchup, company ID=1, if @mustard.com, company ID=2 and so on and so forth.

Would I be able to use dplyrs mutate function for this? I'm thinking the "starts with" command on the emails and create a new column based on the emails? -- The only downside that I can think of to this solution is that if the email list was larger (thankfully mine can be done manually), you'd have to assign an ID per email which may become cumbersome..

Anyways, answers are appreciated!

Answer Source

We could extract the substring after the @ and use match to create a new column of 'ID's

df1 %>%
    separate(email, into = c('prefix', 'suffix'), sep= "@", remove = FALSE) %>%
    mutate(ID = match(suffix, unique(suffix)) %>%
    select(-prefix, -suffix)
