Bhail Bhail - 5 days ago 6
R Question

What is this doing factor(1 * (hp>200), labels=c("weak","good"))

I am trying to think where should I be looking to figure out what the

1 * (hp>200)
is doing in the factor( ). Or for that matter how can I use it.

> test<- mutate(mtcars, HPcat=factor(1 * (hp > 175), labels=c("weak","good")))
> str(test['HPcat'])
'data.frame': 32 obs. of 1 variable:
$ HPcat: Factor w/ 2 levels "weak","good": 1 1 1 1 1 1 2 1 1 1 ...
>
> # So changing the 1 to a different number don't do nothing
>
> test<- mutate(mtcars, HPcat=factor(100 * (hp > 175), labels=c("weak","good")))
> str(test['HPcat'])
'data.frame': 32 obs. of 1 variable:
$ HPcat: Factor w/ 2 levels "weak","good": 1 1 1 1 1 1 2 1 1 1 ...

Answer

R allows a boolean true/false value to be used in numeric expressions, treating FALSE as 0 and TRUE as 1. So, the expression 1*(hp>200) (often written in alternate form 0+(hp>200)) is a way of performing this conversion -- whenever hp exceeds 200, the value is 1, otherwise it's 0.

The factor function, when given a vector of 0s and 1s, turns it into a factor with two levels, 0 and 1 in order. The labels= argument relabels the levels to weak and good. If you use 100*(hp>200), the vector of 0s and 100s turns into a factor with two levels 0 and 100 which are relabeled by the labels= argument, giving the same final answer.

This code will fail if all hp values are <=200 or all values are >200 because it relies on the converted vector to contain both 0s and 1s for the factor to be constructed properly with two levels.

This is terrible code, and I would suggest not emulating it. Much clearer would have been:

factor(ifelse(hp>175,"good","weak"), levels=c("weak","good"))
Comments