user3423201 user3423201 - 1 month ago 7
R Question

Joining data.tables within a function

I would like to change a data.table by doing a join within a function. I understand that data.tables work by reference, so assumed that reassigning a joined version of a data.table to itself would change the original data.table. What simple thing have I misunderstood?

Thanks!

library('data.table')

# function to restrict DT to subset, by join
join_test <- function(DT) {
test_dt = data.table(a = c('a', 'b'), c = c('x', 'y'))
setkey(test_dt, 'a')
setkey(DT, 'a')

DT <- DT[test_dt]
}

DT = data.table(a = c("a","b","c"), b = 1:3)
print(DT)
# a b
# 1: a 1
# 2: b 2
# 3: c 3
haskey(DT)
# [1] FALSE

join_test(DT)
print(DT)
# a b
# 1: a 1
# 2: b 2
# 3: c 3
haskey(DT)
# [1] TRUE


(haskey calls included just to double-check that some of the by reference changes work)

Answer

You can do it by reference, (since you can join and assign columns by reference based on the joined values, without actually saving the joined table back). However, you need to explicitly pick the columns you're after

join_test <- function(DT) {
    test_dt     = data.table(a = c('a', 'b'), c = c('x', 'y'))
    DT[test_dt, c := c, on = 'a'] 
}
Comments