user2979010 user2979010 - 1 month ago 9
R Question

How do I pass column name as variable to data.table in R?

I would like to pass a variable (that holds the column name as a string) as argument to data.table. How do I do it?

Consider a data.table below:

myvariable <- "a"
myvariable_2 <- "b"

DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
DT
# ID a b c
# 1: b 1 7 13
# 2: b 2 8 14
# 3: b 3 9 15
# 4: a 4 10 16
# 5: a 5 11 17
# 6: c 6 12 18



  1. I can use
    subset
    to extract columns i.e:
    subset(DT, TRUE, myvariable)
    but this just outputs the column/s

  2. How do I use
    subset
    to extract column based on some criteria? e.g:
    extract myvariable column when myvariable_2 < 10

  3. How do I extract summary statistics over groups by passing column names as variables?

  4. How do I plot descriptive plots using data.table by passing column names as variables?



I know that this could be easier in
data.frame
i.e. passing variables as column names. But I read everywhere that
data.table
is faster/memory efficient hence would like to stick with it.

Does switching between
data.table
and
data.frame
have huge memory/performance implications?

I do not want to explicitly code the column names as I want this piece of code to be re-usable.

Answer

the comment from @thelatemail is a very good start. Do read that first! Another quick way is below

library(data.table)
df = data.table(a=1:10, b=letters[1:2], c=11:20)

var1="a"
var2="b"

dt1=df[,c(var1,var2), with=F]

Think of "with=F" as making data.table behave like data.frame