Brian D - 1 year ago 66
R Question

# Cut a variable differently based on another grouping variable

Example: I have a dataset of heights by gender.
I'd like to split the heights into low and high where the cut points are defined as the mean - 2sd within each gender.

example dataset:

``````set.seed(8)
df = data.frame(sex = c(rep("M",100), rep("F",100)),
ht = c(rnorm(100, mean=1.7, sd=.17), rnorm(100, mean=1.6, sd=.16)))
``````

I'd like to do something in a single line of vectorized code because I'm fairly sure that is possible, however, I do not know how to write it. I imagine that there may be a way to use
`cut()`
,
`apply()`
, and/or
`dplyr`
to achieve this.

Just discovered the following solution using base r:

``````df\$ht_grp <- ave(x = df\$ht, df\$sex,
FUN = function(x)
cut(x, breaks = c(0, (mean(x, na.rm=T) - 2*sd(x, na.rm=T)), Inf)))
``````

This works because I know that 0 and Inf are reasonable bounds, but I could also use `min(x)`, and `max(x)` as my upper and lower bounds. This results in a factor variable that is split into low, high, and NA.

My prior solution: I came up with the following two-step process which is not so bad:

``````df = merge(df,
setNames( aggregate(ht ~ sex, df, FUN = function(x) mean(x)-2*sd(x)),
c("sex", "ht_cutoff")),
by = "sex")

df\$ht_is_low = ifelse(df\$ht <= df\$ht_cutoff, 1, 0)
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download