Nicholas Parini Nicholas Parini - 1 month ago 25
R Question

data.frame creation with loop

Hi i'm trying to create 10 sub-training set (from a training set of 75%) in loop extracting randomly from a dataframe (DB). i'm using

smp_size<- floor((0.75* nrow(DB))/10)
train_ind<-sample(seq_len(nrow(DB)), size=(smp_size))

training<- matrix(ncol=(ncol(DB)), nrow=(smp_size))
for (i in 1:10){
training[i]<-DB[train_ind, ]
}


what's wrong?

Answer

To partition your dataset in 10 equally sized subsets, you may use the following:

# Randomly order the rows in your training set:
DB <- DB[order(runif(nrow(DB))), ]
# You will create a sequence 1,2,..,10,1,2,...,10,1,2.. you will use to subset
inds <- rep(1:10, nrow(DB)/10)
# split() will store the subsets (created by inds) in a list
subsets <- split(DB, inds)

Note, however, that split() will only give you equally sized subsets. Therefore, it might (and probably will) happen that some of the observations are not be included in any of the subsets.

If you wish to use all observations, causing some subsets to be larger than others, use inds <- rep(1:10, length.out = nrow(DB)) instead