Matek Matek - 6 days ago 5
R Question

R mlr - Creating learning curve from subset of training data and whole test data (not whole training data)?

let's say I'm creating such learning curve (possible little errors in code, it's just a sample). What I want is rather a classical learning curve, where you make enlarge the training set keeping the validation/test set the same size.

learningCurve <- generateLearningCurveData("regr.glmnet",
bh.task,
makeResampleDesc(method = "cv", iters = 5, predict = "both"),
seq(0.1, 1, by = 0.1),
list(setAggregation(auc, train.mean), setAggregation(auc, test.mean))
)


The problem with the code above is that the learners are indeed trained on the fraction of training data, but the
auc.train.mean
measure is evaluated on the whole training set. This results in not really the learning curve I want. I would like this measure to evaluate the performance on the fraction of the training set that was used for learning, like here:

http://www.astroml.org/sklearn_tutorial/practical.html#learning-curves

I believe this sentence explains it all:


Note that when we train on a small subset of the training data, the training error is computed using this subset, not the full training set.


How to achieve this?

Answer

As a reference for future readers, this will be fixed and here's the github issue

https://github.com/mlr-org/mlr/issues/1357

Comments