vio vio - 2 months ago 7
R Question

Return value conditional on characters

I am dealing with a dataset that looks something like this

Year Column1
2000 yes no
2001 yes yes
2002 yes
2003 N/A yes
2004 N/A N/A
2005 no no


As you can see, there are multiple and different strings within one cell. I want to create two new columns in which I have numeric values giving me information about Column1. My end product might look like this

Year Column1 any_yes yes_count
2000 yes no 1 1
2001 yes yes 1 2
2002 yes 1 1
2003 N/A yes 1 1
2004 N/A N/A 0 0
2005 no no 0 0


Where "any_yes" checks whether the cell in Column1 contains a "yes" and returns 1/0. And where "yes_count" counts the number of "yes" in the cell in Column1 and returns the count. My best guess for any_yes would be something like this if I were dealing with numbers:

mydata1 <- mydata %>%
mutate(any_yes = ifelse(Column1 = "yes", 1, 0)


Since I'm not dealing with numbers, I'm not sure how it works. I also don't know how to make the yes_count happen.

Answer

We can use str_count (from stringr) and grep to do this.

library(stringr)
library(dplyr)
df %>% 
     mutate(any_yes = +(grepl("yes", Column1)),
             yes_count = str_count(Column1, "yes"))
#    Year Column1 any_yes yes_count
#1 2000  yes no       1         1
#2 2001 yes yes       1         2
#3 2002     yes       1         1
#4 2003 N/A yes       1         1
#5 2004 N/A N/A       0         0
#6 2005   no no       0         0

We can also get the output without the dplyr

transform(df, any_yes = +(grepl("yes", Column1)),
              yes_count = str_count(Column1, "yes"))

Or without using any packages

within(df, {any_yes <- +(grepl("yes", Column1))
              yes_count <-  lengths(gregexpr("yes", Column1))* any_yes})
#   Year Column1 yes_count any_yes
#1 2000  yes no         1       1
#2 2001 yes yes         2       1
#3 2002     yes         1       1
#4 2003 N/A yes         1       1
#5 2004 N/A N/A         0       0
#6 2005   no no         0       0
Comments