syebill syebill - 29 days ago 13
R Question

How to define weights in gbm package & Kappa statistic for class imbalanced data set (gbm)

I would like to find a way to define weights for gbm in caret package. There is a parameter "weights" in the "train" function for "caret" package but the description says "This argument will only affect models that allow case weights". As per my understanding "gbm" does support defining the weights but I do not know the format of defining weights. Is it simply c(1,10) - where 1 is for majority class and 10 is for minority class?

The second question is on Kappa statistic. I read that Kappa is a better performance metric for class imbalanced data sets but failed to understand how. I will appreciate some guidance on why Kappa is a better performance metric compared to ROC for class imbalanced data set.

Thanks.

Answer Source

To the best of my knowledge, gbm does support case weights and weights should be a vector the length of the data frame. If you are only using two classes I believe you will have to use ROC. I'm not sure I'm qualified to answer your question on ROC vs. Kappa, but here is a paper from 2013 looking at the performance of several metrics on real world data. The general take away seems to be that while kappa can be affected by skew (ROC seems to be relatively immune), ROC tends to mask poor performance.