Bipero Bipero -4 years ago 97
R Question

Randomly drop a column selected from a group, excluding one

I have the following data frame, which is going to be used as an input in a logit regression:

my_frame<-data.frame(y=c(1,0,1),A=c(0,1,1),B=c(1,0,0),C=c(0,0,0),t=c(1,1,1),x=c(1,0,0),z=c(1,0,1))


Knowing that the dummy variables A, B and C are connected through a linear equation (A+B+C=1), I need to drop one of the three before proceeding.

y A B C t x z
1 0 1 0 1 1 1
0 1 0 0 1 0 0
1 1 0 0 1 0 1


Now, here is the difficult part. I want to exclude randomly one of the columns of a group comprised by A,B,C and D, but not the one that has 1 as a value in the last row of the dataframe.
In my example, I want one of B and C to be excluded randomly.

Column D is not present, because in this particular data frame it would always be valued 0, but it is still part of the same group of variables.

Answer Source

I don't really get, what you mean with your last sentence about column D, but anyway, you could try this:

my_frame<-data.frame(y=c(1,0,1),A=c(0,1,1),B=c(1,0,0),C=c(0,0,0),t=c(1,1,1),x=c(1,0,0),z=c(1,0,1))

allRelevantCols <- c("A", "B", "C")

# Get all columns, which can be excluded
allColsToExclude <- allRelevantCols[which(my_frame[nrow(my_frame), allRelevantCols] == 0)]

for (i in 1:<how often you would like to run this>) {
  colsToExclude <- c(sample(allColsToExclude, 1))
  my_frame[, -which(colnames(my_frame) %in% colsToExclude)]
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download