Phil - 1 year ago 62
R Question

# R - Randomly selecting variables and manipulating them on a row-wise basis

I'm trying to go through each row of my data frame, randomly select half of the variables, and set the variable for that particular row to

`NA`
.

For example, with the
`mydf`
dataset below, I'd like for my first row to randomly select 3 variables (say
`QB`
,
`QE`
,
`QF`
) and set their scores to
`NA`
, then again for the 2nd row (say
`QA`
,
`QD`
,
`QE`
) and so forth:

``````library(tibble)
mydf <- tibble(QA = rnorm(100),
QB = rnorm(100),
QC = rnorm(100),
QD = rnorm(100),
QE = rnorm(100),
QF = rnorm(100))
``````

My attempt, but it doesn't appear to do anything:

``````vars <- names(mydf)
for (i in nrow(mydf)){
miss_vars <- sample(vars, 3)
for (j in miss_vars) {
mydf[i,j] <- NA
#mydf[i,][[j]] <- NA #Also tried this.
}
}
``````

Try this vectorized:

``````m <- as.matrix(mydf)
n <- 3 # number of randoms to be selected
inds <- cbind(rep(1:nrow(mydf), each=n), c(replicate(nrow(mydf), sample(ncol(mydf), n))))
m[inds] <- NA
res <- as.data.frame(m)
``````

Here is how:

1. First take the matrix of data frame to benefit from the needed vectorization
2. Define the number of columns to be selected randomly per row
3. Generate the the matrix `inds` in which each row and corresponding random column for data frame is placed
4. Set those rows and cols to `NA`
5. Get back the data frame

In `res`, you will have a data frame in which 3 columns randomly are set to `NA` per row. The output for the provided data frame is:

``````           # QA          QB          QC        QD         QE         QF
# 1  -0.6264538          NA          NA  1.358680 -0.1645236         NA
# 2   0.1836433          NA  0.78213630        NA -0.2533617         NA
# 3          NA          NA  0.07456498        NA  0.6969634  0.3411197
# 4          NA -2.21469989          NA        NA  0.5566632 -1.1293631
# 5          NA  1.12493092  0.61982575        NA         NA  1.4330237
# 6  -0.8204684 -0.04493361          NA        NA         NA  1.9803999
# 7   0.4874291 -0.01619026          NA -0.394290         NA         NA
# 8   0.7383247          NA -1.47075238        NA         NA -1.0441346
# 9          NA  0.82122120          NA  1.100025         NA  0.5697196
# 10         NA  0.59390132  0.41794156        NA         NA -0.1350546
``````

data

``````set.seed(1)
mydf <- data.frame(QA = rnorm(10),
QB = rnorm(10),
QC = rnorm(10),
QD = rnorm(10),
QE = rnorm(10),
QF = rnorm(10))
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download