user5750238 - 24 days ago 12
R Question

# R take ten unique samples and break into training/test sets?

So my task is to break a dataframe of 506 observations into ten different samples of training and test sets (with replacement).
I'm doing this so I can put it through a model and see the average MSE over ten samples.
Thus far, I've got the following idiotically complicated for loop:

``````temp_train<- setNames(lapply(1:10, function(x) {x <-homeprices[sample(1:nrow(homeprices),
.8*n, replace = FALSE), ]; x }), paste0("tr_sample.", 1:10))
for (i in 1:length(temp_train)) {
assign(paste0("df_train_", i), as.data.frame(temp_train[i]))
name<-assign(paste('df_train_', i, sep=''), x[i])
temp_test<- setNames(homeprices[-name], paste0("te_sample.", 1:10))
alpha<-assign(paste0("df_test_", i), as.data.frame(temp_test[i]))
}
``````

This for loop produces say df_test_2, which is a data frame of 506 observations of one variable. It SHOULD be a dataframe of 102 obvs of 13 variables, namely the 102 observations that are NOT in df_train_2.
My question therefore is what's a better way to do this that actually works? I would prefer to not install any packages if possible since I want to get a grasp of base r.

``````x <- replicate(n = 10,expr = {sample(506,404)})
creates a matrix where each of the ten columns is filled with the row indices of a random selection of 404 rows (80% or so of 506). Then you'd loop through your model fitting and use the columns of `x` to select the training subset of your data that you pass to your model. Negative indexing of the same indices would yield the corresponding 20% for testing.