user12202013 user12202013 - 4 months ago 17
R Question

Perform a cumulative group operations with R and dplyr

I'm trying to process data based on a sequential group id. There are J groups and I want to run the data processing function for groups i < j=1..J

The most trivial case is when each row is it's own group and you calculate the cumulative sum. However I have multiple rows in each group and the processing is more complicated than summation.

Here is an minimal example of my data format:

row | group | value
----|-------|------
1 | 1 | 2065
2 | 1 | 2075
3 | 2 | 18008
4 | 2 | 17655
: | : | :
N-1 | J-1 | 2345
N | J | 5432


One solution I've thought of is to replicate my data, stacking it and reassigning the groups in each data so that group $i

row | group | value
----|-------|------
1 | 1 | 2065
2 | 1 | 2075
3 | 2 | 2065
4 | 2 | 2075
5 | 2 | 18008
6 | 2 | 17655
: | : | :


However this seems tedious and inefficient as my data will be copied many times.

Does anyone know of a more efficient way of processing the data in a cumulative group by way?

lmo lmo
Answer

One methodology that should work here is to split the data.frame by group id, and then run a for loop (or lapply) with the the cumulative groups. Below is an example using a for loop as I think it is will be more straightforward to implement.

# split data.frame by group ID
myList <- split(df, df$group)
# initialize empty output list
myOutputList <- list()

# loop through group IDs, including the next one
for(i in seq_along(unique(df$group))) {
  # create temporary df for analysis
  myTempDf <- do.call(rbind, myList[seq_len(i)])

  ## perform analysis on myTempDf here ##

  # save results
  myOutputList[[i]] <- list(<list of analysis ouput>)
}

The output would be a nested list. I'd recommend naming each item in the nested list to make it easier to access, like myOutputList[[i]][["regression.1"]].

Note that this assumes that the groups are properly sorted properly in the original data.frame and that the group ids are the counting numbers 1,2,3,4,... as in your example.