Joehax Joehax - 2 months ago 7
R Question

Assigning a subset of a data.table to a subset of another data.table

I have two data.tables:

library(data.table)
dt1 <- data.table(A=1:10, B=1, C=2, D=3)
dt2 <- data.table(A2=5:9, B2=4, C2=5, D2=6)


I want to assign elements of column B2 and C2 in dt2 to column B and C in dt1 where A = A2, such that I get a table that looks like this:

dt1
# A B C D
# 1: 1 1 2 3
# 2: 2 1 2 3
# 3: 3 1 2 3
# 4: 4 1 2 3
# 5: 5 4 5 3
# 6: 6 4 5 3
# 7: 7 4 5 3
# 8: 8 4 5 3
# 9: 9 4 5 3
# 10: 10 1 2 3


I know I can assign one column at a time:

id1 <- which(dt1$A %in% dt2$A2)
id2 <- which(dt2$A2 %in% dt1$A)
dt1$B[id1] <- dt2$B2[id2]
dt1$C[id1] <- dt2$C2[id2]


however this seems like a lot of lines of code to do only very little, especially if I had many columns.

I was thinking something like

# Not working:
dt1[id1][,list(B,C)] <- dt2[id2][,list(B2,C2)]


would work, but I get an error message.

Is there a smarter, nicer way to do it?

Answer

We can do a join on the 'A' with 'A2' and assign (:=) the values of columns of interest to 'C' and 'D'

library(data.table)#v1.9.7+
dt1[dt2,  c('C', 'D') := .(C2, D2),  on = .(A= A2)]
dt1
#     A B C D
# 1:  1 1 2 3
# 2:  2 1 2 3
# 3:  3 1 2 3
# 4:  4 1 2 3
# 5:  5 1 5 6
# 6:  6 1 5 6
# 7:  7 1 5 6
# 8:  8 1 5 6
# 9:  9 1 5 6
#10: 10 1 2 3

If there are many columns that we need to change, instead of typing out the columns, we get the values using mget and assign it to the corresponding column names in 'dt1'

dt1[dt2, names(dt1)[3:4] :=  mget(names(dt2)[3:4]), on = .(A = A2)]