I am working with a dataset that has some fairly absurdly long IDs. After not getting the result I expected, I traced the issue to a step of code where I merged two data tables.
options("scipen" = 999) ## The only way I know to disable scientific notation.
table1 <- data.table("ID" = sample(400000000000:400000000004, size = 5),
"colA" = rep('a', 5))
table2 <- data.table("ID" = sample(400000000000:400000000001, size = 2),
"colB" = rep('b', 2))
merge(table1, table2, by = "ID", all.x = F)
all.x = F
ID colA colB
1: 400000000000 a b
2: 400000000000 a b
3: 400000000001 a b
4: 400000000001 a b
merge documentation :
Merges on numeric columns: Columns of numeric types (i.e., double) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. For large numbers (integers > 2^31), we recommend using bit64::integer64. Have a look at setNumericRounding to learn more.
The solution here is to use
setNumericRounding(0) merge(table1, table2, by = "ID", all.x = F) ID colA colB 1: 400000000000 a b 2: 400000000001 a b