user3423201 - 1 year ago 51

R Question

I would like to change a data.table by doing a join within a function. I understand that data.tables work by reference, so assumed that reassigning a joined version of a data.table to itself would change the original data.table. What simple thing have I misunderstood?

Thanks!

`library('data.table')`

# function to restrict DT to subset, by join

join_test <- function(DT) {

test_dt = data.table(a = c('a', 'b'), c = c('x', 'y'))

setkey(test_dt, 'a')

setkey(DT, 'a')

DT <- DT[test_dt]

}

DT = data.table(a = c("a","b","c"), b = 1:3)

print(DT)

# a b

# 1: a 1

# 2: b 2

# 3: c 3

haskey(DT)

# [1] FALSE

join_test(DT)

print(DT)

# a b

# 1: a 1

# 2: b 2

# 3: c 3

haskey(DT)

# [1] TRUE

(haskey calls included just to double-check that some of the by reference changes work)

Answer Source

You can do it by reference, (since you can join and assign columns by reference based on the joined values, without actually saving the joined table back). However, you need to explicitly pick the columns you're after

```
join_test <- function(DT) {
test_dt = data.table(a = c('a', 'b'), c = c('x', 'y'))
DT[test_dt, c := c, on = 'a']
}
```