user3570187 user3570187 - 3 months ago 9
R Question

Grouping variables based on location, year and person name

I would like to create an aggregate of columns

A<- c("xyz", "xyz", "xy", "xx","xx", "y")
year<- c(2009,2010,2009,2009,2010,2009)
location<- c('london', 'london', 'paris', 'newyork','mumbai','sydney')
df<- data.frame(A, year, location)


I would like to create a variable called 'yearsofexperience'which will summarize the total numbers of years spent by one person in a given location.

A year location yearsofexperience
xyz 2009 london 2
xyz 2010 london 2
xy 2009 paris 1
xx 2009 newyork 1
xx 2010 mumbai 1
y 2009 sydeny 1


Can someone help?

Answer

You can use n_distinct() to count the unique years for each combination of person and location. This should work for you:

library(dplyr)
df %>% group_by(A, location) %>% mutate(yoe = n_distinct(year))

# Source: local data frame [6 x 4]
# Groups: A, location [5]

#       A  year location   yoe
#  <fctr> <dbl>   <fctr> <int>
#1    xyz  2009   london     2
#2    xyz  2010   london     2
#3     xy  2009    paris     1
#4     xx  2009  newyork     1
#5     xx  2010   mumbai     1
#6      y  2009   sydney     1

You can also use data.table syntax and the corresponding function is uniqueN():

library(data.table)
setDT(df)[, yoe := uniqueN(year), .(A, location)]