Tim_Utrecht Tim_Utrecht - 24 days ago 11
R Question

Increase speed with rbindlist does not work with two for loops

I have a dataset that looks like this one:

test <- data.table(Weight=sample(x = c(20:100),500,replace = T),y=rnorm(500),z=rnorm(500))

> head(test)
Weight y z
1: 87 -0.7946846 -0.03136408
2: 97 1.6570765 0.61080309
3: 80 1.1592073 -0.09389739
4: 23 -0.0268602 -1.36896141
5: 32 1.3171078 -2.19978789
6: 78 -0.1961162 0.62026338


I want to duplicate each row as many times as the value under weight.I have achieved this with the following code: (I included a progressbar)

system.time(
for (i in 1:nrow(test)){
setTxtProgressBar(pb,i)
for (j in 1:test[i,]$Weight){
Testoutcome <- rbind(Testoutcome, test[i,])
}
})
user system elapsed
32.91 0.08 33.57


I found a post here that explains that rbindlist is much faster than rbind. So I modified the code like this:

system.time(
for (i in 1:nrow(test)){
setTxtProgressBar(pb,i)
for (j in 1:test[i,]$Weight){
Testoutcome <- rbindlist(list(Testoutcome, test[i,]))
}
})
user system elapsed
27.72 0.05 28.31


So it seems not to be so effective. My actual dataset is about 1.000 times larger and the query takes forever... Any ideas how to speed up? Maybe I should get the bind outside the loop?

Answer

This should be fast, and is quite simple:

test[rep(1:.N,Weight)]
Comments