LizPS - 1 year ago 80

R Question

I usually can figure out how to vectorize with a little thought, but despite reading through a bunch of StackOverflow q&a's, I'm still stumped!

I want to replace these nested for loops with a suitable apply function, but if there is some obvious different approach to the whole problem that I've missed, feel free to tell me so!

Think of this example in the context of a test where the first row is the key and each subsequent row is a students' answers. As output, I want an array with a 1 for every correct answer and a 0 for every incorrect answer. The for loops work, but are VERY slow when you scale up to thousands of rows and columns.

Here's my reproducible example, and thanks in advance for any help!

`#build sample data`

dat <- array(dim=c(9,6))

for (n in 1:9){

dat[n,1:6] <- c(paste("ID00",n,sep=""),

sample(c("A","B","C","D"), size=5, replace=TRUE))}

dat[3,4]<-NA

key<-c("key","A","B","B","C","D")

dat <- rbind(key,dat)

>dat

[,1] [,2] [,3] [,4] [,5] [,6]

"key" "A" "B" "B" "C" "D"

"ID001" "B" "A" "D" "B" "C"

"ID002" "C" "C" "C" "B" "B"

"ID003" "A" "C" NA "D" "D"

"ID004" "D" "B" "D" "A" "A"

"ID005" "A" "C" "A" "C" "A"

"ID006" "D" "D" "B" "B" "A"

"ID007" "B" "D" "A" "D" "A"

"ID008" "D" "D" "B" "D" "A"

"ID009" "D" "C" "B" "D" "D"

#score file

dat2 <- array(dim=c(9,5))

for (row in 2:10){

for (column in 2:6){

if (is.na(dat[row,column])){

p <- NA

}else if (dat[row,column]==dat[1,column]){

p <- 1

}else p <- 0

dat2[row-1,column-1]<-p

}

}

> dat2

[,1] [,2] [,3] [,4] [,5]

[1,] 0 0 0 0 0

[2,] 0 0 0 0 0

[3,] 1 0 NA 0 1

[4,] 0 1 0 0 0

[5,] 1 0 0 1 0

[6,] 0 0 1 0 0

[7,] 0 0 0 0 0

[8,] 0 0 1 0 0

[9,] 0 0 1 0 1

Answer Source

Set a seed for reproducibility:

```
set.seed(1)
dat <- array(dim=c(9,6))
for (n in 1:9){
dat[n,1:6] <- c(paste("ID00",n,sep=""),
sample(c("A","B","C","D"), size=5, replace=TRUE))}
dat[3,4]<-NA
key<-c("key","A","B","B","C","D")
dat <- rbind(key,dat)
```

This will do the job:

```
key <- rep(dat[1, -1], each = nrow(dat) - 1L) ## expand "key" row
dummy <- (dat[-1, -1] == key) + 0L ## vectorized / element-wise "=="
```

Basically we want a vectorized `"=="`

. But we need first expand `dat[1,-1]`

to the same dimension of `dat[-1,-1]`

. Finally the `+ 0L`

coerce `TRUE / FALSE`

matrix to `1 / 0`

matrix.

```
# [,1] [,2] [,3] [,4] [,5]
# 0 1 0 0 0
# 0 0 0 1 0
# 1 0 NA 0 1
# 0 0 0 0 1
# 0 0 0 0 0
# 0 0 1 0 0
# 0 0 1 0 1
# 0 0 0 1 0
# 0 0 0 1 0
```