JaimeCr JaimeCr - 3 months ago 12
R Question

data.table reference column in another data.table by name

I have the following data.tables:

Comparison <- data.table(code = c("AAA", "BBB"),
elem1 = c(1, 2),
elem2 = c(4, 4))

DT <- data.table(A = c("AAA", "AAA", "AAA", "AAA"),
B = c("BBB", "BBB", "BBB", "BBB"),
C = c(1, 2, 3, 4))


Now, I would want to add a new column based on a comparison of a column from
Comparison
and from
DT
. The following command generates the expected output:

DT[, newCol := {ifelse( abs(C - Comparison[code == "AAA", elem2]) == 0, "0", "1")}]

Output:

A B C newCol
1: AAA BBB 1 1
2: AAA BBB 2 1
3: AAA BBB 3 1
4: AAA BBB 4 0


However, if instead of hardcoding the column value of the column
A
, I use the column itself with this:

DT[, newCol := {ifelse( abs(C - Comparison[code == A, elem2]) > 0, "0", "1")}]


It outputs the following error, which I'm not sure how to avoid:

Error in `[.data.table`(Comparison, code == A, elem2) :
RHS of == is length 4 which is not 1 or nrow (2). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.


It seems to me that the operation is not vectorizing the elements of the column
A
of
DT
in
Comparison
and I don't really understand why, because the elements of the column
C
are taken properly (i.e. it uses the elements of
C
individually but not the elements of
A
). How could I perform this comparison?

Any help would be greatly appreciated.

Answer

We can use join with on

DT[Comparison, newCol := as.integer(C != elem2), on = c("A" = "code"), nomatch = 0]
DT
#     A   B C newCol
#1: AAA BBB 1      1
#2: AAA BBB 2      1
#3: AAA BBB 3      1
#4: AAA BBB 4      0