vahab vahab - 1 year ago 57
R Question

How to pull out values corresponding to a random selection and get the cumulative summation of them?

Let's say I have a data frame with two columns for now:

df<- data.frame(scores_set1=c(32,45,65,96,45,23,23,14),

I want to randomly select some rows

selected_indeces<- sample(c(1:8), 4, replace = FALSE)

Now I want to add up the values of
sequentially meaning that for first
I just need the value of that specific row, for the second I want the second row value + the first selected value ... and for the nth index I want sum of all values selected already + the value nth row. So, first need a matrix to put the results in


and then one loop for each column and another for each selected_index

for (s in 1:ncol(df)) #for each column
for (i in 1:length(selected_indeces)) #for each randomly selected index
if (i==1)
cumulative_loss[i,s]<- df[selected_indeces[i],s]

if (i > 1)
cumulative_loss[i,s]<- df[selected_indeces[i],s] +

The script works although It might be a naive way for doing such thing but the thing is that if (i=4) is only adds values of 4th and third selection while I want it to add first, second , third and fourth random selection and return it.

Answer Source

Here's a way to do this with data.table (taking into account your comment on @bgoldst's answer:

library(data.table); setDT(df)

#sample 4 elements of each column (i.e., every element of .SD), then cumsum them
df[ , lapply(.SD, function(x) cumsum(sample(x, 4)))]

If you want to use different indices for each column, I would pre-choose them first:

idx <- lapply(integer(ncol(df)), function(...) sample(nrow(df), 4))
# [[1]] #indices for column 1
# [1] 2 8 6 3
# [[2]] #indices for column 2
# [1] 4 8 5 1

Then modify the above slightly:

df[ , lapply( seq_along(.SD), function(jj) cumsum(.SD[[jj]][ idx[[jj]] ]) )]

This is the craziest compendium of brackets/parentheses I've ever written in a functional line of code, so I guess it makes sense to break things down a bit:

  • seq_along .SD to pick out the index number of each column, jj
  • .SD[[jj]] selects the jth column, idx[[jj]] selects the indices for that column, .SD[jj]][idx[jj]]] picks the idx[[jj]] rows of the jth column; this is equivalent to .SD[idx[jj], jj, with = FALSE]
  • Lastly, we cumsum the length(idx[[jj]]) rows we chose for column jj.


#     V1  V2
# 1:  45  98
# 2:  59 161
# 3:  82 182
# 4: 147 214