Endre Grüner Ofstad Endre Grüner Ofstad - 4 months ago 11
R Question

Connected copies of data.tables

During my workflow I often make a copy of main data.frame/data.table that I do some aspects of the work on and then some other on the other copy, before joining them or something later on. However, I often experience that these copies are still connected to each other. So that edits done on one are also done on the other.Unfortunately I am not able to replicate it, but copy-pasting from my console it looks something like this:

# 'used3' is a copy of 'used' with some altercations to it
c("nLocs","nDays") %in% names(used)
[1] FALSE FALSE
> used3[, nDays :=uniqueN(yDay),c("ID","Year","Season")]
> used3[, nLocs :=.N,c("ID","Year","Season")]
> c("nLocs","nDays") %in% names(used)
[1] TRUE TRUE


So that alterations done on the copy are allso done on the original. Is this a bug? Am I calling them too similar names...or what?

R-version: 3.3
data.table version: 1.9.6

But also experienced in older versions of both R and data.table

Answer

You shouldn't see this behaviour with data.frame, but you will see it for data.table objects.

?data.table::copy explains that data.tables prevent creating copies wherever possible, and the result is that after modifying a data.table with set* or := operators, such as:

library(data.table)
A <- data.table(x=1:10)
B <- A
A[, y:=10:1]

B
##      x  y
##  1:  1 10
##  2:  2  9
##  3:  3  8
##  4:  4  7
##  5:  5  6
##  6:  6  5
##  7:  7  4
##  8:  8  3
##  9:  9  2
## 10: 10  1

A and B are still identical (i.e. element y was added to both).

The bottom line is that to make a copy of a data.table, you can instead do:

A <- data.table(x=1:10)
B <- copy(A)
A[, y:=10:1]

B
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10

Note that using the $ operator to add an element to a data.table does result in a copy being made:

A <- data.table(x=1:10)
B <- A
A$y <- 10:1

B
##      x
##  1:  1
##  2:  2
##  3:  3
##  4:  4
##  5:  5
##  6:  6
##  7:  7
##  8:  8
##  9:  9
## 10: 10