Susu - 4 months ago 12

R Question

So after I've imported a data.set via memsci (which worked very nicely btw! :)), I now have the problem that almost all of the data is converted to (non-ordered) factors, but the levels are not 1,2,3,4,5 (which is what it should be for calculations) but rather "fully agree" down to "don't agree at all".

This leads to the problem that I can't use

`as.numeric(levels(f))[f]`

To get import my data I used this:

`data <- as.data.set(spss.system.file("data.sav"))`

dat <- as.data.frame(data)

However: The informations seems to be there.

`str(var1)`

Factor w/ 5 levels "don't agree at all",..: NA 1 1 1 1 1 1 1 1 1 ...

labels(dat$var1)

[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"

[13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"

levels(dat$var1)

[1] "do not agree at all" ". ." ". . ."

[4] ". . . ." "fully agree"

Where are the values stored? I've tried

`labels(var1)`

`var1`

`as.numeric(var1)`

`dat[,1:ncol(dat)] <- lapply(dat[,1:ncol(dat)], function(x) as.numeric(x))`

the variable is still being considered a factor and behaves exactly the same as before.

Edit: Reproducible example thanks to @jakub

`var1 <- factor(c(1,2,3,4,5,5,4,3,2,1),`

levels = as.character(1:5),

labels = c("Fully agree", "....", "...", "..", "Do not agree at all"))

Answer

You say:

`as.numeric(var1)`

gives me the information I need, BUT I don't think one should apply this as stated in the R help for factors

If you refer to:

In particular,

`as.numeric`

applied to a factor is meaningless, and may happen by implicit coercion.

then you are most likely confusing two issues. You either want the *labels*, or you want the *levels*.

If you have numerical values that happen to be labels of a factor, then indeed you have to convert to numeric using `as.numeric(levels(f))[f]`

. An example:

```
var1 <- factor(c(1,2,3,1),
labels = c("123", "5", "-11"),
levels = as.character(1:3))
levels(var1)
# [1] "123" "5" "-11"
as.numeric(var1)
# [1] 1 2 3 1 #this indeed does not make much sense - the values are lost!
as.numeric(levels(var1))[var1]
#[1] 123 5 -11 123
```

**But in your case, this does not apply,** because (if I understood correctly), you don't want the labels, but the underlying integers. For you, it makes sense that `Fully agree`

means `1`

. In such case, `as.numeric(var1)`

is fine.