Zach Zach - 2 months ago 9
R Question

Accessing Data after Splitting into Lists

I think this is a very beginner question, but searching the web (and SO) hasn't led me to figure out the answer despite trying quite a few solutions. Here's the problem:

I have a csv dataset with many columns, for example: yearID X Y Z. I read this in using:

data<-read.csv("/foo/bar.csv")

From there, I use X Y and Z to calculate A for each line:
data$A<-(X+Y)/Z


Now I want to plot the average A in each year, so I do:
list_df <- split(data, data$yearID)
. Hooray, I can see that if I do summary(list_df[[5]]) I see a summary of X Y Z and A for the fifth year.

Here is where I'm stuck, I then try to do something like:

for(year in list_df){
xy<-data.frame(mean(year$yearID, na.rm=T), mean(year$A, na.rm=T))
}


This loop "works" (it doesn't throw an error), but what comes out in xy is just the last year and the average A for that year. Ideally, I want to eventually plot "Avg A vs YearID." I've tried a number of permutations on the for loop based on other code examples I've found, but none have yet given me a working solution. Suggestions are most welcome to any part of this process, as I've just started learning R.

Cheers,
Zach

Answer

Unless you need the list split out for other reasons, you can use aggregate:

data <- data.frame(yearId=rep(2010:2014,each=2),X=runif(10,1,100),Y=runif(10,50,150),Z=runif(10,100,200))
data$A <- (data$X+data$Y)/data$Z

data2 <- aggregate(A~yearId,data,mean)
plot(data2$yearId,data2$A)
Comments