nilsole nilsole - 2 months ago 6
R Question

Checking multiple data frame columns at once (flexible manner)

Looking for a better way: How can I make R check the values of a flexible subset of multiple columns element-wise (let's say

Var2
and
Var3
here) and write the result of the check to a new logical column?

Is there a shorter, more elegant way than using row-wise
apply()
here?

df <- read.csv(
text = '"Var1","Var2","Var3"
"","",""
"","","a"
"","a",""
"a","a","a"
"a","","a"
"","a",""
"","",""
"","","a"
"","a",""
"","","a"'
)

criticalColumns <- c("Var2", "Var3")

df$criticalColumnsAreEmpty <-
apply(df[, criticalColumns], 1, function(curRow) {
return(all(curRow == ""))
})


I could also do this in an explicit way, but this is not a flexible then:

df$criticalColumnsAreEmpty <- df$Var2 == "" & df$Var3 == ""


Desired output:

Var1 Var2 Var3 criticalColumnsAreEmpty
TRUE
a FALSE
a FALSE
a a a FALSE
a a FALSE
a FALSE
TRUE
a FALSE
a FALSE
a FALSE

Answer

We can use rowSums on the logical matrix

df$criticalColumnsAreEmpty <- !rowSums(df[criticalColumns]!="")
df$criticalColumnsAreEmpty
#[1]  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE

Or another option (for big datasets to avoid converting to matrix for memory reasons) is loop over the columns, check whether the elements are blank and use Reduce with &

Reduce(`&`, lapply(df[criticalColumns], function(x) !nzchar(as.character(x))))