Bhail - 8 months ago 41

R Question

I am trying to think where should I be looking to figure out what the

`1 * (hp>200)`

`> test<- mutate(mtcars, HPcat=factor(1 * (hp > 175), labels=c("weak","good")))`

> str(test['HPcat'])

'data.frame': 32 obs. of 1 variable:

$ HPcat: Factor w/ 2 levels "weak","good": 1 1 1 1 1 1 2 1 1 1 ...

>

> # So changing the 1 to a different number don't do nothing

>

> test<- mutate(mtcars, HPcat=factor(100 * (hp > 175), labels=c("weak","good")))

> str(test['HPcat'])

'data.frame': 32 obs. of 1 variable:

$ HPcat: Factor w/ 2 levels "weak","good": 1 1 1 1 1 1 2 1 1 1 ...

Answer Source

R allows a boolean true/false value to be used in numeric expressions, treating `FALSE`

as `0`

and `TRUE`

as `1`

. So, the expression `1*(hp>200)`

(often written in alternate form `0+(hp>200)`

) is a way of performing this conversion -- whenever `hp`

exceeds 200, the value is 1, otherwise it's 0.

The `factor`

function, when given a vector of 0s and 1s, turns it into a factor with two levels, `0`

and `1`

in order. The `labels=`

argument relabels the levels to `weak`

and `good`

. If you use `100*(hp>200)`

, the vector of 0s and 100s turns into a factor with two levels `0`

and `100`

which are relabeled by the `labels=`

argument, giving the same final answer.

This code will fail if all `hp`

values are <=200 or all values are >200 because it relies on the converted vector to contain both 0s and 1s for the factor to be constructed properly with two levels.

This is terrible code, and I would suggest not emulating it. Much clearer would have been:

```
factor(ifelse(hp>175,"good","weak"), levels=c("weak","good"))
```