min - 2 months ago 12

R Question

Having a list containing 244 data frames. This list is called d, and

`d[[1]]`

`d[[1]]`

year pos days sal

1 2009 A 31 2000

2 2009 B 60 4000

3 2009 C 10 600

4 2010 B 10 1000

5 2010 D 90 7000

I would like to group data by year, adding days and sal, and select pos where days is maximum in the group.

The result is like:

`year pos days sal`

1 2009 B 101 6600

2 2010 D 100 8000

I know how to do this when it comes to the case doing it to only one data frame.

I did it this way:

`library(dplyr)`

ygroup<-group_by(d[[1]]$year)

summarise(ygroup, pos = pos[which.max(days)], days = sum(days), sal = sum(sal))

But I want to do this same operation to the 244 data frames in the list d.

I tried this:

`e<-list()`

ygroup<-list()

for(i in 1:244){

ygroup[[i]]<-group_by(d[[i]]$year)

e[[i]]<-summarise(ygroup[[i]], pos = pos[which.max(days)], days = sum(days), sal = sum(sal))

}

But this doesn't work, an error showing up.

`Error: expecting a single value`

(I think this part;

`pos = pos[which.max(days)]`

How can I solve this...?

Any comments will be greatly appreciated! :)

Answer

We can use `lapply`

with anonymous function call to loop over the `list`

of `data.frame`

s ('d')

```
lapply(d, function(x) x %>%
group_by(year) %>%
summarise(pos = pos[which.max(days)],
days = sum(days), sal = sum(sal)))
```