Ben Ben - 3 months ago 12
R Question

Why use as.factor() instead of just factor()

Possibly a dumb question, but I recently saw Matt Dowle write some code with

as.factor()
, specifically
for (col in names_factors) set(dt, j=col, value=as.factor(dt[[col]]))
in a comment to this answer. I used this snippet, but I needed to explicitly set the factor levels, so I had to change
as.factor(dt[[col]])
to
factor(dt[[col]], levels=my_levels)
. This got me thinking, what (if any) is the benefit to using
as.factor()
versus just
factor()
?

Answer

as.factor is a wrapper for factor, but it allows quick return if the input vector is already a factor:

function (x) 
{
    if (is.factor(x)) 
        x
    else if (!is.object(x) && is.integer(x)) {
        levels <- sort(unique.default(x))
        f <- match(x, levels)
        levels(f) <- as.character(levels)
        if (!is.null(nx <- names(x))) 
        names(f) <- nx
        class(f) <- "factor"
        f
    }
else factor(x)
}

Per @Frank: Stating the obvious here, but: it's not a mere wrapper, since this "quick return" will leave factor levels and ordered-ness, alone while factor() will not:

f = factor("a", levels=c("a","b"))
#[1] a
#Levels: a b

factor(f)
#[1] a
#Levels: a

as.factor(f)
#[1] a
#Levels: a b
Comments