Kelsey Kelsey - 1 month ago 16
R Question

How to use aggregate and summary function to get unique columns in a dataframe?

I have used the aggregate function to get a summary of results based on their collection location. The summary returns 3 obs. of 2 variables. One variable is the group name, one is the summary statistic per group.

How do I get R to view each column (Group, min, 1st quartile, median, etc.) as unique in my data frame? Ultimately I'd like this to be 3 obs. of 7 variables, one for each column. OR I'd like to know how to cleanly get min, median, and max by Location. Thanks!

Result <- c(1,1,2,100,50,30,45,20, 10, 8)
Location <- c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha")

df <- data.frame(Result, Location)
head(df)

Agg <- aggregate(df$Result, list(df$Location), summary)

head(Agg)
Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
1 Alpha 1.00 6.25 26.50 38.50 58.75 100.00
2 Beta 1.00 10.50 20.00 23.67 35.00 50.00
3 Gamma 2.00 6.00 10.00 14.00 20.00 30.00

Answer

Since aggregate's simplify parameter defaults to TRUE, it's simplifying the results of calling the function (here, summary) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:

with(Agg, data.frame(Group.1, as.data.frame(x)))

##   Group.1 Min. X1st.Qu. Median  Mean X3rd.Qu. Max.
## 1   Alpha    1     6.25   26.5 38.50    58.75  100
## 2    Beta    1    10.50   20.0 23.67    35.00   50
## 3   Gamma    2     6.00   10.0 14.00    20.00   30

Alternately, dplyr's summarise family of functions can handle multiple summary statistics well:

library(dplyr)

df %>% group_by(Location) %>% summarise_all(funs(min, median, max))

## # A tibble: 3 × 4
##   Location   min median   max
##     <fctr> <dbl>  <dbl> <dbl>
## 1    Alpha     1   26.5   100
## 2     Beta     1   20.0    50
## 3    Gamma     2   10.0    30

If you really want all of summary, you can use broom::tidy to turn the results into a data.frame:

df %>% group_by(Location) %>% do(broom::tidy(summary(Result)))

## Source: local data frame [3 x 7]
## Groups: Location [3]
## 
##   Location minimum    q1 median  mean    q3 maximum
##     <fctr>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1    Alpha       1  6.25   26.5 38.50 58.75     100
## 2     Beta       1 10.50   20.0 23.67 35.00      50
## 3    Gamma       2  6.00   10.0 14.00 20.00      30