Max Headroom Max Headroom - 1 month ago 9
R Question

How to check if at least one of multiple variables is equal to 1 in a function?

I have multiple response data that has been split into separate columns with cSplit_e into a format like this...

ID Response IM2 IM4 ... IM10 IM16
1 1 4,7,10 NA 1 1 NA
2 2 7,5,16,8 NA NA NA 1
3 3 2,10 1 NA 1 NA


I'm trying to set up a function will check each row to see if a subset of columns contains at least one "1". It will then create a new column, setting it to "1" if a row had at least one "1" in the specified columns.

Previously I have done this by writing out a for loop for each column I want to create, like so...

parade$q9PaperAggregate <- NA
parade$q9MagazineAggregate <- NA

#Newspaper Aggregate Loop
for (i in 1:nrow(parade)) { #Starts loop setting i to each row number
if (is.na(parade$q9PaperAds[i]) == FALSE | ##These three lines check each row is not all NA
is.na(parade$q9PaperCircs[i]) == FALSE |
is.na(parade$q9PaperWebAds[i]) == FALSE) {
parade$q9PaperAggregate[i] <- 1 #Sets agg cell value to 1 if not all NA for each i
}
}
#Magazine Aggregate Loop
for (i in 1:nrow(parade)) {
if (is.na(parade$q9MagazineAds[i]) == FALSE |
is.na(parade$q9MagazineWebAds[i]) == FALSE) {
parade$q9MagazineAggregate[i] <- 1
}
}


This works, but is clearly inefficient. I want to create a general function that does this for inputs. Here is what I have so far:

#df = object; n = new column name; col = vector of columns I want to check
atleastone <- function(df, n, col) {
#n = new column name (will run over list of vector - new col names with the old columns you want to agg)
df[n] <- NA
for (i in 1:nrow(df)) { #Starts loop setting i to each row number
if (df[i, col] == 1) {
(df[n])[i] <- 1 #Sets new column cell value to 1 if not all NA for each i
}
}
}


My main two issues are 1) how to run the for loop to check multiple columns for the value if the number of columns to be checked can vary and 2) how to pass the row and column to subset. Currently "col" uses the actual name of the column while "i" just takes the numerical row value. This was fine in the format I used before of...

df$column[i]


...but the $ operator doesn't seem to work with values being passed to it from a function.

Any idea what I'm doing wrong here? Is there a better way to do this?

Thank you for your time.

EDIT:

I turned @SymbolixAU's response into a function, like so:

#Aggregate Function
#takes input df = object; n = name of new column in double quotes; l = columns you want to agg
agger <- function(df, l, n) {
#checks if the sum of the rows in the specified columns is greater than 1
#this produces a logical value which is multiplied by 1 to change it to numeric
df[n] <- ((rowSums(df[, l] == 1) > 0) * 1)
}


Follow-up question - I am trying to use mapply to pass a list "x" of two different vectors of columns to the argument "l" and a vector "y" of two names for the two new columns that will be created and the target object df = BR. The command looks like this:

mapply(agger, l = x, n = y, MoreArgs = list(BR))


This is sending me to the debug window with no messages or info on what is going wrong. Is my mapply set up incorrectly and/or is there a better way to run this function on multiple groups of columns in the same dataframe?

Thank you.

Answer

This can be done in one line without any loops or *applys:

df$new_col <- ((rowSums(df[, col] == 1, na.rm=T) > 0) * 1)

Explanation

  1. You can check if a value in a subset of columns is equal to 1 using df[, col] == 1

  2. Then you can check how many of those values there are in each row using rowSums()

  3. Then if there are any rowSums results with a value greater than 0, you know there was at least one 1 in one of the columns in that row

  4. the > 0 check returns a logical, so multiply it by 1 to turn it into numeric


Example

## taking the data you've provide
df <- read.table(text = "ID Response IM2   IM4  IM10  IM16
1   4,7,10  NA     1          1    NA
2 7,5,16,8  NA    NA         NA     1
3     2,10   1    NA          1    NA", header = T)


df
#   ID Response IM2 IM4 IM10 IM16
# 1  1   4,7,10  NA   1    1   NA
# 2  2 7,5,16,8  NA  NA   NA    1
# 3  3     2,10   1  NA    1   NA

## specify the columns of interest
col <- c("IM4", "IM10")

## assign the new column
df$new_col <- ((rowSums(df[, col] == 1, na.rm=T) > 0) * 1)
df
#    ID Response IM2 IM4 IM10 IM16 new_col
# 1  1   4,7,10  NA   1    1   NA       1
# 2  2 7,5,16,8  NA  NA   NA    1       0
# 3  3     2,10   1  NA    1   NA       1

And this also works on the data provided by @Barker

set.seed(100)
df <- data.frame(ID = 1:20)
df[paste0("IM", 1:7)] <- replicate(7,sample(c(rep(NA,5),1,1),20, replace = TRUE))

col <- paste0("IM", 1:7)
df$new_col <- ((rowSums(df[, col] == 1, na.rm=T) > 0) * 1)