jmb277 - 3 years ago 197
R Question

R: using 'funs' within a `summarise_each` to generate a confidence interval:

``````library(tidyverse) # library(dplyr) # would probably work, too.
``````

Given a data frame:

``````my_df <- data.frame(run = c(1,2,3,4),
w_t = c(5.452595, 4.719883, 5.110823, 5.009686),
L = c(4.212980, 4.674020, 3.849464, 3.971810),
mu = c(0.9962918, 1.0141293, 0.9962637, 0.9954),
n = c(4,4,4,4))
``````

Note that this is a small subset of the actual data set (many more columns are not shown). I'd like to generate a number of stats and I use
`dplyr`
to do so:

``````my_stats <- my_df %>%
ungroup() %>%
select(w_t, L, mu) %>%
summarise_each(funs(mean, sd, min, max))
``````

This works and results in a df with 12 columns with naming format: colname_stat.

My question: Is there a way to insert a function into the
`summarise_each`
such that the result also contains the 95% confidence interval? I.e. it'd look something like:
`summarise_each(funs(mean, sd, min, max, blah))`
where
`blah`
would be a function that's called or an equation that I put in. It could be two parts such that I need to enter one equation for lower and one for upper, etc.

I've created a function to get me a half-width confidence interval, but I haven't figured out how to make it work inside the
`funs`
of the statement. It looks like this:

``````my_ct <- function(s, n, ci){
# you must enter the ci in decimal e.g. .95
z_t <- qt( 1-(1-ci)/2, df = n-1)
h <- z_t * s/sqrt(n)
return(h)
}
``````

I'm arranging the data in this manner for comparison and a data frame provides a flexible format for me to present.

How about this? No need to pass `n` or `s`, you can just calculate that within your function:

``````get_CI_half_width <- function(x, prob) {
n <- length(x)
z_t <- qt(1 - (1 - prob) / 2, df = n - 1)
z_t * sd(x) / sqrt(n)
}

lower <- function(x, prob = 0.95) {
mean(x) - get_CI_half_width(x, prob)
}

upper <- function(x, prob = 0.95) {
mean(x) + get_CI_half_width(x, prob)
}

my_df %>%
ungroup() %>%
select(w_t, L, mu) %>%
summarise_all(funs(mean, sd, min, max, lower, upper))
``````

Gives:

``````  w_t_mean   L_mean  mu_mean   w_t_sd      L_sd       mu_sd  w_t_min    L_min mu_min  w_t_max   L_max   mu_max w_t_lower  L_lower
1 5.073247 4.177068 1.000521 0.302337 0.3640999 0.009081505 4.719883 3.849464 0.9954 5.452595 4.67402 1.014129  4.592161 3.597704
mu_lower w_t_upper  L_upper mu_upper
1 0.9860705  5.554332 4.756433 1.014972
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download