Sameer Bhatnagar Sameer Bhatnagar - 1 month ago 20
R Question

R: ifelse assignment in data.table

I am a teacher, and would like to correctly use the

data.table
package in
R
to automatically grade student answers in a log file, i.e. add a column called
correct
if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)

Below is a MWE:

set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
question_id=c(1,1,1,2,2,2,3,3,4,4),
student_answer= c(2,4,1,3,2,4,4,5,2,1))


My question lies in what is the correct
data.table
way to use
ifelse
in
j
, especially if we depend on another table?

log[,correct:=ifelse(student_answer %in%
question_table[log$question_id %in% id]$correct_ans,1,0)]


As can be seen below, question 1 and 2 both have multiple possible correct answers.

> question_table
id correct_ans
1: 1 2
2: 1 4
3: 2 2
4: 2 4
5: 3 4
6: 4 1


While the correct column is calculated without errors, something isn't right: e.g. when
student b
answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the
correct
column are off, which leads me to believe there is something i am not getting with how variables have are scoped.

> log
student question_id student_answer correct
1: b 1 2 1
2: c 1 4 1
3: b 1 1 1 <- ?
4: b 2 3 0
5: c 2 2 1
6: b 2 4 1
7: c 3 4 1
8: b 3 5 0
9: a 4 2 1 <- ?
10: c 4 1 1


I considered making a helper column with the correct ans in the
log
table by
join
ing with
question_table
, but that does not work since the key is not unique in the latter.

Any and all help would be appreciated.
Thanks in advance.

Answer

You can use a join:

# initialize to zero
log[, correct := 0L ]

# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
   correct := 1L ] 

    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       0
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       0
10:       c           4              1       1
Comments