Currently I'm using "cube" function for balanced sampling in R. It works fine on moderate amount of data. However, if the entire population of 10,000,000+ is used, R hangs. Is there any alternative that works with "big-data"?
First, you should reinstall the package
BalancedSampling to make sure that you have the latest version 1.4. For me, it seems to work fine for
N = 10000000 (takes about 30s to select a sample)
library(BalancedSampling) N = 10000000 # population size n = 100 # sample size p = rep(n/N,N) # inclusion probabilities X = cbind(p,runif(N),runif(N),runif(N)) # matrix of 3 auxiliary variables system.time(cube(p,X)) user system elapsed 31.31 0.02 31.42