vahab - 2 months ago 5x
R Question

How to pull out values corresponding to a random selection and get the cumulative summation of them?

Let's say I have a data frame with two columns for now:

``````df<- data.frame(scores_set1=c(32,45,65,96,45,23,23,14),
scores_set2=c(32,40,60,98,21,23,21,63))
``````

I want to randomly select some rows

``````selected_indeces<- sample(c(1:8), 4, replace = FALSE)
``````

Now I want to add up the values of
`selected_indeces`
sequentially meaning that for first
`selected_indeces`
I just need the value of that specific row, for the second I want the second row value + the first selected value ... and for the nth index I want sum of all values selected already + the value nth row. So, first need a matrix to put the results in

``````   cumulative_loss<-matrix(rep(NA,8*2),nrow=8,ncol=2)
``````

and then one loop for each column and another for each selected_index

``````for (s in 1:ncol(df)) #for each column
{
for (i in 1:length(selected_indeces)) #for each randomly selected index
{
if (i==1)
{
cumulative_loss[i,s]<- df[selected_indeces[i],s]
}

if (i > 1)
{
cumulative_loss[i,s]<- df[selected_indeces[i],s] +
df[selected_indeces[i-1],s]
}
}
}
``````

The script works although It might be a naive way for doing such thing but the thing is that if (i=4) is only adds values of 4th and third selection while I want it to add first, second , third and fourth random selection and return it.

Here's a way to do this with `data.table` (taking into account your comment on @bgoldst's answer:

``````library(data.table); setDT(df)

#sample 4 elements of each column (i.e., every element of .SD), then cumsum them
df[ , lapply(.SD, function(x) cumsum(sample(x, 4)))]
``````

If you want to use different indices for each column, I would pre-choose them first:

``````set.seed(1023)
idx <- lapply(integer(ncol(df)), function(...) sample(nrow(df), 4))
idx
# [[1]] #indices for column 1
# [1] 2 8 6 3
#
# [[2]] #indices for column 2
# [1] 4 8 5 1
``````

Then modify the above slightly:

``````df[ , lapply( seq_along(.SD), function(jj) cumsum(.SD[[jj]][ idx[[jj]] ]) )]
``````

This is the craziest compendium of brackets/parentheses I've ever written in a functional line of code, so I guess it makes sense to break things down a bit:

• `seq_along` `.SD` to pick out the index number of each column, `jj`
• `.SD[[jj]]` selects the `j`th column, `idx[[jj]]` selects the indices for that column, `.SD[jj]][idx[jj]]]` picks the `idx[[jj]]` rows of the `j`th column; this is equivalent to `.SD[idx[jj], jj, with = FALSE]`
• Lastly, we `cumsum` the `length(idx[[jj]])` rows we chose for column `jj`.

Result:

``````#     V1  V2
# 1:  45  98
# 2:  59 161
# 3:  82 182
# 4: 147 214
``````