Kelsey - 10 days ago 4
R Question

# How to use aggregate and summary function to get unique columns in a dataframe?

I have used the aggregate function to get a summary of results based on their collection location. The summary returns 3 obs. of 2 variables. One variable is the group name, one is the summary statistic per group.

How do I get R to view each column (Group, min, 1st quartile, median, etc.) as unique in my data frame? Ultimately I'd like this to be 3 obs. of 7 variables, one for each column. OR I'd like to know how to cleanly get min, median, and max by Location. Thanks!

``````Result <- c(1,1,2,100,50,30,45,20, 10, 8)
Location <- c("Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha", "Beta", "Gamma", "Alpha")

df <- data.frame(Result, Location)

Agg <- aggregate(df\$Result, list(df\$Location), summary)

Group.1 x.Min. x.1st Qu. x.Median x.Mean x.3rd Qu. x.Max.
1   Alpha   1.00      6.25    26.50  38.50     58.75 100.00
2    Beta   1.00     10.50    20.00  23.67     35.00  50.00
3   Gamma   2.00      6.00    10.00  14.00     20.00  30.00
``````

Since `aggregate`'s `simplify` parameter defaults to `TRUE`, it's simplifying the results of calling the function (here, `summary`) to a matrix. You can reconstruct the data.frame, coercing the column into its own data.frame:

``````with(Agg, data.frame(Group.1, as.data.frame(x)))

##   Group.1 Min. X1st.Qu. Median  Mean X3rd.Qu. Max.
## 1   Alpha    1     6.25   26.5 38.50    58.75  100
## 2    Beta    1    10.50   20.0 23.67    35.00   50
## 3   Gamma    2     6.00   10.0 14.00    20.00   30
``````

Alternately, dplyr's `summarise` family of functions can handle multiple summary statistics well:

``````library(dplyr)

df %>% group_by(Location) %>% summarise_all(funs(min, median, max))

## # A tibble: 3 × 4
##   Location   min median   max
##     <fctr> <dbl>  <dbl> <dbl>
## 1    Alpha     1   26.5   100
## 2     Beta     1   20.0    50
## 3    Gamma     2   10.0    30
``````

If you really want all of `summary`, you can use `broom::tidy` to turn the results into a data.frame:

``````df %>% group_by(Location) %>% do(broom::tidy(summary(Result)))

## Source: local data frame [3 x 7]
## Groups: Location [3]
##
##   Location minimum    q1 median  mean    q3 maximum
##     <fctr>   <dbl> <dbl>  <dbl> <dbl> <dbl>   <dbl>
## 1    Alpha       1  6.25   26.5 38.50 58.75     100
## 2     Beta       1 10.50   20.0 23.67 35.00      50
## 3    Gamma       2  6.00   10.0 14.00 20.00      30
``````