jamborta jamborta - 7 months ago 104
R Question

select multiple columns in data.table R

I couldn't find the answer for this simple question.

what's the equivalent of selecting multiple columns in data.table just like this in data.frame

df <- data.frame(a=1,b=2,c=3)



Just set with = FALSE:

dt <- data.table(a=1:2, b=2:3, c=3:4)
dt[, 2:3, with = FALSE]
#    b c
# 1: 2 3
# 2: 3 4

As far as I can tell, the argument is named "with" because it determines whether the column index should be evaluated within the frame of the data.table, as it would be when using, e.g., base R's with() and within().

From ?data.table::with:

By default with=TRUE and j is evaluated within the frame of x. The column names can be used as variables.

When with=FALSE j is a character vector of column names, a numeric vector of column positions to select or of the form startcol:endcol, and the value returned is always a data.table...

And there is some relating thinking in ?setkey :

It isn't good programming practice, in general, to use column numbers rather than names. [...] If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a setkey [or a select] by column number will then refer to a different column, possibly returning incorrect results with no warning. (A similar concept exists in SQL where "select * from ..." is considered poor programming style [by some] when a robust, maintainable system is required.) If you really wish to use column numbers, it's possible but deliberately a little harder; e.g., setkeyv(DT,colnames(DT)[1:2]) [or setting with=FALSE in selects].