Bobby Bobby - 20 days ago 5
R Question

Apply different functions to data frame columns depending on the column names matching a pattern

Given a data frame:

l$`__a` <- data.frame(`__ID` = stringi::stri_rand_strings(10, 1),
col = stringi::stri_rand_strings(10, 1), check.names = F )

And two supporting functions:

prefixColABC <- function(dfCol) {
paste0("ABC_", dfCol)

prefixColDEF <- function(dfCol) {
paste0("DEF_", dfCol)

How can I apply the first function for data frame column names staring with
and the second for all other columns?

To solve this problem, I thought I would subset first all columns with names starting with
, apply
to them, then subset all others and apply
to them. Then I would use
to put all of the columns together into one data frame again.

Here's some of my progress:

Here's how the first function can be applied to all columns: apply(l$`__a`, 2, prefixColABC) )

And here's how I can subset the columns. All with column names starting with

l$`__a`[ grep(pattern = "^__", l$`__a`), 1 ]

I don't know how to subset all other columns that don't match this pattern. And I don't know how to set up the condition inside the apply statement

I think this question is similar to mine, but does not select the columns based on matching a pattern:
R Applying different functions to different data frame columns


Try this assuming that the input data frame is called dd:

hasPrefix <- grepl("^__", names(dd))
dd[, hasPrefix] <- lapply(dd[, hasPrefix, drop = FALSE], prefixColABC)
dd[, !hasPrefix] <- lapply(dd[, !hasPrefix, drop = FALSE], prefixColDEF)


> dd
    __ID   col
1  ABC_G DEF_x
2  ABC_n DEF_U
3  ABC_c DEF_G
5  ABC_p DEF_E
6  ABC_U DEF_j
8  ABC_0 DEF_l
9  ABC_V DEF_i
10 ABC_B DEF_u

Note: The input dd, prior to modification, is:

dd <- structure(list(`__ID` = structure(c(4L, 6L, 3L, 7L, 8L, 9L, 5L, 
1L, 10L, 2L), .Label = c("0", "B", "c", "G", "M", "n", "O", "p", 
"U", "V"), class = "factor"), col = structure(c(8L, 7L, 2L, 9L, 
1L, 4L, 2L, 5L, 3L, 6L), .Label = c("E", "G", "i", "j", "l", 
"u", "U", "x", "X"), class = "factor")), .Names = c("__ID", "col"
), row.names = c(NA, -10L), class = "data.frame")