crf crf - 4 months ago 78
R Question

Use dplyr to convert all variables coded as "Y"/"N" to TRUE/FALSE

I have the following data

df <- data.frame(A = 1:3, YN_B = c('Y', 'N', 'N'), YN_C = c('N', 'N', 'Y'))


These variables that take values in c('Y', 'N') are not very useful to me. They would be much more useful encoded as TRUE for 'Y' and FALSE for 'N'. Helpfully, the Y/N columns are named in a way that lets me find them programmatically. I figure that
mutate_if
should be a help in that case.

I am trying to achieve this with mutate_if, which I haven't used before, but it's not quite working. Here's my attempt

df %>% mutate_if(matches('^YN'), .funs = funs(function(x) x == 'Y'))
Error in get(as.character(FUN), mode = "function", envir = envir) :
object 'p' of mode 'function' was not found


Where am I going wrong?

Answer

matches returns integer which specify the column position, however, mutate_if requires boolean values as predicate. In order to work with matches you can use mutate_at instead:

library(dplyr)
df %>% mutate_at(vars(matches('^YN')), funs(. == 'Y'))
#   A  YN_B  YN_C
# 1 1  TRUE FALSE
# 2 2 FALSE FALSE
# 3 3 FALSE  TRUE

Here is an example of how matches works:

matches('^YN', vars = c("A", "YN_B"))
# [1] 2

Add another case for mutate_if here, we can mutate columns based on the column types:

lapply(df, class)
# $A
# [1] "numeric"

# $YN_B
# [1] "character"

# $YN_C
# [1] "character"

df %>% mutate_if(is.character, funs(. == 'Y'))
#   A  YN_B  YN_C
# 1 1  TRUE FALSE
# 2 2 FALSE FALSE
# 3 3 FALSE  TRUE