user12202013 - 1 year ago 109
R Question

Perform a cumulative group operations with R and dplyr

I'm trying to process data based on a sequential group id. There are J groups and I want to run the data processing function for groups i < j=1..J

The most trivial case is when each row is it's own group and you calculate the cumulative sum. However I have multiple rows in each group and the processing is more complicated than summation.

Here is an minimal example of my data format:

``````row | group | value
----|-------|------
1 |     1 |  2065
2 |     1 |  2075
3 |     2 | 18008
4 |     2 | 17655
: |     : |     :
N-1 |   J-1 |  2345
N |     J |  5432
``````

One solution I've thought of is to replicate my data, stacking it and reassigning the groups in each data so that group \$i

``````row | group | value
----|-------|------
1 |     1 |  2065
2 |     1 |  2075
3 |     2 |  2065
4 |     2 |  2075
5 |     2 | 18008
6 |     2 | 17655
: |     : |     :
``````

However this seems tedious and inefficient as my data will be copied many times.

Does anyone know of a more efficient way of processing the data in a cumulative group by way?

One methodology that should work here is to split the data.frame by group id, and then run a `for` loop (or `lapply`) with the the cumulative groups. Below is an example using a `for` loop as I think it is will be more straightforward to implement.

``````# split data.frame by group ID
myList <- split(df, df\$group)
# initialize empty output list
myOutputList <- list()

# loop through group IDs, including the next one
for(i in seq_along(unique(df\$group))) {
# create temporary df for analysis
myTempDf <- do.call(rbind, myList[seq_len(i)])

## perform analysis on myTempDf here ##

# save results
myOutputList[[i]] <- list(<list of analysis ouput>)
}
``````

The output would be a nested list. I'd recommend naming each item in the nested list to make it easier to access, like `myOutputList[[i]][["regression.1"]]`.

Note that this assumes that the groups are properly sorted properly in the original data.frame and that the group ids are the counting numbers 1,2,3,4,... as in your example.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download