Fabian Werner Fabian Werner - 1 month ago 10
R Question

Accessing a non existent column of a data.table with dollar sign

I am using the (probably outdated) version 1.10.4 of data.table with R version 3.3.2 (2016-10-31). I can access non-existent columns with the dollar sign. Is this behaviour wanted?

code:

realOffloads = data.table(BAG_TAG = c(1,2,3))
realOffloads = realOffloads[, .(BAG_TAG, OFFLOAD_REAL = T)]
"OFFLOAD" %in% names(realOffloads)
x = realOffloads$OFFLOAD


although I am getting 'FALSE' as an answer to the question whether or not the column 'OFFLOAD' exists I am getting something out of it (T,T,T) when accessing it using the dollar sign.

I was using that pretty often in the code so now I am a little scared :-()

Regards,
FW

Answer Source

From the base R documentation:

Both [[ and $ select a single element of the list. The main difference is that $ does not allow computed indices, whereas [[ does. x$name is equivalent to x[["name", exact = FALSE]]. Also, the partial matching behavior of [[ can be controlled using the exact argument.

A data.table is a data.frame, and a data.frame is a list, thus a data.table is a list. So inexact matching with $ is allowed. That's nice for an interactive session where you're trying to quickly explore the data, but not so nice for non-interactive code that can blow up when the unexpected happens.

This is why it's almost always a bad idea to use $ for subsetting if you're not 100% sure the column exists. And even then, it could fall prey to typos. Instead of DT$name, use DT[["name"]]. This won't raise an error, but it will return NULL, which is easy to check. Even if you let it go, that NULL will probably cause an error down the line.