Susu - 1 year ago 72
R Question

# Converting factors with character names to numerics (after import from an .sav file)

So after I've imported a data.set via memsci (which worked very nicely btw! :)), I now have the problem that almost all of the data is converted to (non-ordered) factors, but the levels are not 1,2,3,4,5 (which is what it should be for calculations) but rather "fully agree" down to "don't agree at all".

This leads to the problem that I can't use

`as.numeric(levels(f))[f]`
to convert the factor into numerics.

To get import my data I used this:

``````data <- as.data.set(spss.system.file("data.sav"))
dat <- as.data.frame(data)
``````

However: The informations seems to be there.

``````str(var1)

Factor w/ 5 levels "don't agree at all",..: NA 1 1 1 1 1 1 1 1 1 ...

labels(dat\$var1)
[1] "1"   "2"   "3"   "4"   "5"   "6"   "7"   "8"   "9"   "10"  "11"  "12"
[13] "13"  "14"  "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"

levels(dat\$var1)
[1] "do not agree at all" ". ."              ". . ."
[4] ". . . ."          "fully agree"
``````

Where are the values stored? I've tried
`labels(var1)`
and just
`var1`
, but I neither works. However: Using
`as.numeric(var1)`
gives me the information I need, BUT I don't think one should apply this as stated in the R help for factors. Also after using
`dat[,1:ncol(dat)] <- lapply(dat[,1:ncol(dat)], function(x) as.numeric(x))`

the variable is still being considered a factor and behaves exactly the same as before.

Edit: Reproducible example thanks to @jakub

``````var1 <- factor(c(1,2,3,4,5,5,4,3,2,1),
levels = as.character(1:5),
labels = c("Fully agree", "....", "...", "..", "Do not agree at all"))
``````

Answer Source

You say:

`as.numeric(var1)` gives me the information I need, BUT I don't think one should apply this as stated in the R help for factors

If you refer to:

In particular, `as.numeric` applied to a factor is meaningless, and may happen by implicit coercion.

then you are most likely confusing two issues. You either want the labels, or you want the levels.

If you have numerical values that happen to be labels of a factor, then indeed you have to convert to numeric using `as.numeric(levels(f))[f]`. An example:

``````var1 <- factor(c(1,2,3,1),
labels = c("123", "5", "-11"),
levels = as.character(1:3))
levels(var1)
# [1] "123" "5"   "-11"
as.numeric(var1)
# [1] 1 2 3 1  #this indeed does not make much sense - the values are lost!
as.numeric(levels(var1))[var1]
#[1] 123   5 -11 123
``````

But in your case, this does not apply, because (if I understood correctly), you don't want the labels, but the underlying integers. For you, it makes sense that `Fully agree` means `1`. In such case, `as.numeric(var1)` is fine.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download