Megatron Megatron - 2 months ago 24
R Question

Repeated balanced k-fold cross validation using caret in R

I would like to perform repeated k-fold cross validation using the

caret
package. This can be specified in the
trainControl()
function.

My question is, are the folds created using
trainControl(method="repeatedcv", number=k, repeats=n)
balanced? Are these k-folds generated the same way as the balanced ones generated by
createFolds()
?




For clarity, here are examples of balanced and unbalanced k-folds:

The
iris
species breakdown:

table(iris$Species)
# setosa versicolor virginica
# 50 50 50


Now, we create random unbalanced and balanced folds:

k <- 10

unbalanced <- sample(rep(seq(k), length=length(iris$Species)))

bList <- createFolds(iris$Species, k)

# Below, we reformat the list of folds
balanced <- rep(-1, length(iris$Species))
for (i in seq_len(k)) balanced[bList[[i]]] <- i


Now, we visualize the class breakdown for each set of k-folds.

classBreakdownTable <- function(i, folds) table(as.factor(iris$Species)[which(folds == i)])

sapply(seq_len(k), classBreakdownTable, unbalanced)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# setosa 4 6 8 4 4 4 7 6 5 2
# versicolor 5 5 1 5 5 7 4 6 6 6
# virginica 6 4 6 6 6 4 4 3 4 7

sapply(seq_len(k), classBreakdownTable, balanced)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# setosa 5 5 5 5 5 5 5 5 5 5
# versicolor 5 5 5 5 5 5 5 5 5 5
# virginica 5 5 5 5 5 5 5 5 5 5

Answer

The answer is yes.

If the method = "repeatedcv" it calls on the function createMultiFolds, which internally calls on createFolds, but n times as specified in repeats = n