user3655009 user3655009 - 4 months ago 10
R Question

Dataset with less yes (6%) and more No (94%). All classifier algorithms (ANN, C4, CART) in SPSS predicts all values as No in test set. What do i do?

Dataset with less yes (6%) and more No (94%). All classifier algorithms (ANN, C4, CART) in SPSS predicts all values as No in test set. What do i do?

The data has around 2500 rows and 85 columns.

Answer

Look into oversampling techniques as for example implemented in R by the SMOTE function in the DMwR package.

Here is a short tutorial: http://amunategui.github.io/smote/
and here a Youtube video: https://www.youtube.com/watch?v=1Mt7EuVJf1A

The SMOTE function oversamples your rare event by using bootstrapping and k-nearest neighbor to synthetically create additional observations of that event. The definition of rare event is usually attributed to any outcome/dependent/target/response variable that happens less than 15% of the time.