Parker Parker - 19 days ago 5
R Question

R - using aggregate on several variables and combine unique values

Suppose I have a dataframe cars, and the first several lines are:

Brand Type Year
BMW Compact 2009
BMW Sedan 2010
BENZ Sedan 2010
BENZ Compact 2012
BMW Compact 2008
BENZ Sedan 2011


I want to use aggregate to first find each combination of variable "Brand" and "Type", and then find the unique number of years of each combination. For example, the desired output is like:

Brand Type num_unique_years
BMW Compact 2(which are 2009, 2008)
BMW Sedan 1(2010)
Benz Compact 1(2012)
Benz Sedan 2(2010,2011)


The steps are basically this:

x <- subset(cars, Brand == "bmw" & Type == "compact")
length(unique(x$Year))
which gives me the output 2


However, I don't know how to combine these individual steps in one function.

Thanks for help

Answer

Just define the appropriate aggregation function and use aggregate. If you don't need the years themselves then omit the line marked ##. No packages are used.

len_years <- function(years) {
  u <- unique(sort(years))
  paste0(length(u), "(", toString(u), ")") ##
}
Ag <- aggregate(Year ~., cars, len_years)
names(Ag)[3] <- "num_unique_years"

giving:

> Ag
  Brand    Type num_unique_years
1  BENZ Compact          1(2012)
2   BMW Compact    2(2008, 2009)
3  BENZ   Sedan    2(2010, 2011)
4   BMW   Sedan          1(2010)

Alternately, replace the last two lines of code (the aggregate statement and the next statement) with:

Ag <- aggregate(data.frame(num_unique_years = cars[[3]]), cars[-3], len_years)

Note: The input cars in reproducible form is:

Lines <- "Brand         Type         Year
BMW           Compact      2009
BMW           Sedan        2010
BENZ          Sedan        2010
BENZ          Compact      2012
BMW           Compact      2008
BENZ          Sedan        2011"
cars <- read.table(text = Lines, header = TRUE)
Comments