Crystal Crystal - 2 months ago 9
R Question

use condition in data frame with "with" function in R

My data looks like this:

manager date country gender age q1 q2 q3 q4 q5 agecat
1 1 10/24/08 US M NA 5 4 5 5 5 NA
2 2 10.28/08 US F 45 3 5 2 5 5 NA
3 3 10/1/08 UK F NA 3 5 5 5 2 NA
4 4 10/12/08 UK M 39 3 3 4 NA NA NA
5 5 5/1/09 UK F 99 2 2 1 2 1 NA


Now I am trying to set agecat = "Elder" if age > 55. I tried the following two sets of codes and got different results:

Code 1 (worked)

leadership$agecat[leadership$age > 55] <- "Elder"


Code 2: (didn't work)

with(leadership, {
agecat[age > 55] <- "Elder"
})


Can anyone help me understand what's the difference between the two and why the second one doesnt' work? Many thanks!

Answer

Firstly, your with expression

                 { 
    agecat[age > 55] <- "Elder"
}

doesn't return anything, so you get nothing back. There are plenty of ways to do this, and using with in this situation actually seems a little clunky because you would have to do the following. Notice the column is being given back on the second line of the expression.

leadership$agecat <- with(leadership, {
    agecat[age > 55] <- "Elder"
    agecat
})

Not really a streamlined with call, imo. You could clean it up a bit with

leadership$agecat <- with(leadership, replace(agecat, age > 55, "Elder"))

which is basically the same thing just packed in a function. But you could also use within, which updates the data and returns it back to you (so we would need to assign the result).

leadership <- within(leadership, agecat[age > 55] <- "Elder")

And as @BenBolker notes, transform is another option. This gives the full updated data back as well, same as within.

leadership <- transform(leadership,agecat = replace(agecat, age > 55, "Elder"))

Long story short, it might be best to stick with your Code 1 code for this.

Comments