marbel marbel - 2 months ago 9
R Question

R: How to write a function that gets the levels of column in a data.table

Some data:

require(data.table)
set.seed(123)
DT <- data.table(factor = c("a", "b", "c"), num = rpois(6, 30))
DT[["factor"]] <- factor(DT[["factor"]])
levels(DT[["factor"]])
# [1] "a" "b" "c"


I'm trying to write a function that gets the levels of the the DT. Here's what I've attempted so far:

get_levels <- function(data, factor){
data = substitute(data)
factor = substitute(factor)
factor_levels = levels(data[["factor"]])
print(factor_levels)
}

get_levels(DT, factor)

get_levels2 <- function(data, factor){
data = substitute(data)
factor = substitute(factor)
factor_levels = levels(data[[factor]])
print(factor_levels)
}

get_levels2(DT, factor)


get_levels3 <- function(data, factor){
data = substitute(data)
factor = substitute(factor)
factor_levels = levels(eval(data[[deparse(factor)]]))
print(factor_levels)
}

get_levels3(DT, factor)


I'm getting this error:

Error in data[["factor"]] : object of type 'symbol' is not subsettable


and this one:

Error in data[[deparse(factor)]] :
object of type 'symbol' is not subsettable


As i don't have much experience programming i don't know exactly what's the purpose for the functions used for passing variables in functions:
substitute
,
deparse
,
eval
,
parse
. I've been reading the documentation and i'm not finding it very clear. So it would be interesting if someone could provide a clearer use for each function or perhaps point to resources to learn more about it.

Answer

If you're not experienced with R (or programming in general) avoid substitute, deparse, eval, etc. They are rarely necessary.

DT <- data.table(f = c("a", "b", "c"), num = rpois(6, 30))
DT[["f"]] <- factor(DT[["f"]])

get_levels <- function(data,fac){
  levels(data[[fac]])
}

get_levels(DT,'f')

And don't call your column factor. That's a function, and it's just confusing.