MP61 MP61 - 1 month ago 7
R Question

Using dplyr, Remove all strings from a data frame

I have a data frame with 300 columns which has a string variable somewhere which I am trying to remove. I have found this solution in stack overflow using lapply (see below), which is what I want to do, but using the dplyr package. I have tried using the mutate each function but cant seem to make it work

"If your data frame (df) is really all integers except for NAs and garbage then then the following converts it.

df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))

You'll have a warning about NAs introduced by coercion but that's just all those non numeric character strings turning into NAs.

Answer

If you want to use this line of code:

df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))

with dplyr (by which I assume you mean "using pipes") the easiest would be

df2 = df %>% lapply(function(x) as.numeric(as.character(x))) %>%
    as.data.frame

To "translate" this into the mutate_each idiom:

mutate_each(df, funs(as.numeric(as.character(.)))

This function will, of course, convert all columns to character, then to numeric. To improve efficiency, don't bother doing two conversions on columns that are already numeric:

mutate_each(df, funs({
    if (is.numeric(.)) return(.)
    as.numeric(as.character(.))
}))

Data for testing:

df = data.frame(v1 = 1:10, v2 = factor(11:20))