swk swk - 4 years ago 157
R Question

R new variable based on other column

Using the dataset 'cars' in R I would like to add a new column to this dataset that takes the average of the column 'dist' dependent on the values in the column 'speed', while also having R evaluating the 'speed' as a grouping parameter.

So first I need 19 groups reflecting the unique speeds in cars$speed:

4 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22 23 24 25


For each of these 19 groups I would like to know what the average dist is, but only if at least one of the entries in each of these 19 categories meet a criteria (e.g. at least one dist-value is above 20.

With the cars-dataset I would get something like this back for the cars with speed 4 to 12:

speed dist avr_dist_if_one_speed_is_above20
4 2 none
4 10 none
7 4 13
7 22 13
8 16 none
9 10 none
10 18 26
10 26 26
10 34 26
11 17 22.5
11 28 22.5
12 14 21.5
12 20 21.5
12 24 21.5
12 28 21.5
...


Since the 2 cars that have speed 4 both have a dist below 20, I do not get an average for these two entries. For the cars that have speed 7 I get an average dist of 13 since at least one car with speed 7 have a dist above 20.

For the cars with speed 8 and 9 I do not get an average, as both of these cars have a dist below 20. The cars with speed 10 should return an average of 26

since two of the cars with speed 10 have a dist above 20.

For cars with speed 11 I get 22.5

For cars with speed 12 I get 21.5.

The R-code should calculate an average dist for all the remaining speed-categories, as the rest all include cars with dist>20.

Answer Source

This will do what you are looking for if I understand your question right.

library(dplyr)

cars %>%
        group_by(speed) %>%
        summarise(n = n(),
                  avg_dist = ifelse(any(dist > 20),mean(dist, na.rm = T), NA)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download