j_5chneider j_5chneider - 3 months ago 8
R Question

R - Obtain the connection between the numeric values and the level labels in a factor

I'm struggling to find the connection between numeric (integer) values that exist in a R factor object and its level labels. I know how to define the levels and the labels. But let's assume I get an unfamiliar data set in which I'll find several factors (here: sex & color):

test <- data.frame(
factor(c(1,2,1,1,2,2,1),
levels= c(1,2),
labels = c("female", "male")
),
factor(c(3,2,2,1,4,4,5),
levels= c(1,2,3,4,5),
labels= c("red", "green", "blue", "yellow", "brown")
)
)

names(test) <- c("sex", "color")
test

sex color
1 female blue
2 male green
3 female green
4 female red
5 male yellow
6 male yellow
7 female brown


I will be able to obtain the level labels by using
attributes()
and I will be able to obtain the numeric values e.g. by using
test$sex <- as.numeric(test$sex)


But how do I know, that 1 equals female and 2 equals male? Same thing (even worse) for the colors. How do I establish the connection?

Thanks

Answer

As others have said, the integer value simply increments along the length of the levels. Personally, I find this easiest to visualize in a reference table.

test <- data.frame(
  sex = factor(c(1,2,1,1,2,2,1),
               levels= c(1,2),
               labels = c("female", "male")
  ),
  color = factor(c(3,2,2,1,4,4,5),
                levels= c(1,2,3,4,5),
                labels= c("red", "green", "blue", "yellow", "brown")
  )
)

# Make a reference table
data.frame(level = seq_along(levels(test$color)),
           label = levels(test$color))

  level  label
1     1    red
2     2  green
3     3   blue
4     4 yellow
5     5  brown

If you want to get the references for all of the factors in a data frame, you can vectorize the code:

factor_reference <- function(data)
{
  Ref <- 
    lapply(data,
           function(x)
           {
             if (is.factor(x)) data.frame(level = seq_along(levels(x)),
                                          label = levels(x))
             else NULL
           }
    )

  Ref[!vapply(Ref, is.null, logical(1))]
}

factor_reference(test)
$sex
  level  label
1     1 female
2     2   male

$color
  level  label
1     1    red
2     2  green
3     3   blue
4     4 yellow
5     5  brown
Comments