Susu Susu - 2 months ago 7
R Question

Converting factors with character names to numerics (after import from an .sav file)

So after I've imported a data.set via memsci (which worked very nicely btw! :)), I now have the problem that almost all of the data is converted to (non-ordered) factors, but the levels are not 1,2,3,4,5 (which is what it should be for calculations) but rather "fully agree" down to "don't agree at all".

This leads to the problem that I can't use

as.numeric(levels(f))[f]
to convert the factor into numerics.

To get import my data I used this:

data <- as.data.set(spss.system.file("data.sav"))
dat <- as.data.frame(data)


However: The informations seems to be there.

str(var1)

Factor w/ 5 levels "don't agree at all",..: NA 1 1 1 1 1 1 1 1 1 ...

labels(dat$var1)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
[13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"

levels(dat$var1)
[1] "do not agree at all" ". ." ". . ."
[4] ". . . ." "fully agree"


Where are the values stored? I've tried
labels(var1)
and just
var1
, but I neither works. However: Using
as.numeric(var1)
gives me the information I need, BUT I don't think one should apply this as stated in the R help for factors. Also after using
dat[,1:ncol(dat)] <- lapply(dat[,1:ncol(dat)], function(x) as.numeric(x))

the variable is still being considered a factor and behaves exactly the same as before.

Edit: Reproducible example thanks to @jakub

var1 <- factor(c(1,2,3,4,5,5,4,3,2,1),
levels = as.character(1:5),
labels = c("Fully agree", "....", "...", "..", "Do not agree at all"))

Answer

You say:

as.numeric(var1) gives me the information I need, BUT I don't think one should apply this as stated in the R help for factors

If you refer to:

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion.

then you are most likely confusing two issues. You either want the labels, or you want the levels.

If you have numerical values that happen to be labels of a factor, then indeed you have to convert to numeric using as.numeric(levels(f))[f]. An example:

var1 <- factor(c(1,2,3,1), 
               labels = c("123", "5", "-11"),
               levels = as.character(1:3))
levels(var1)
# [1] "123" "5"   "-11"
as.numeric(var1)
# [1] 1 2 3 1  #this indeed does not make much sense - the values are lost!
as.numeric(levels(var1))[var1]
#[1] 123   5 -11 123

But in your case, this does not apply, because (if I understood correctly), you don't want the labels, but the underlying integers. For you, it makes sense that Fully agree means 1. In such case, as.numeric(var1) is fine.