Fruitspunchsamurai Fruitspunchsamurai - 21 days ago 6
R Question

KNNCAT error "some classes have only one member"

I'm trying to run a KNN analysis on auto data using knncat's knncat function. My training set is around 700,000 observations. The following happens when I try to implement the analysis. I've attempted to remove NA using the complete cases method while reading the data in. I'm not sure exactly how to take care of the errors or what they mean.

kdata.training = kdataf[ind==1,]
kdata.test = kdataf[ind==2,]
kdata_pred = knncat(train = kdata.training, test = kdata.test, classcol = 4)



Error in knncat(train = kdata.training, test = kdata.test, classcol = 4) :
Some classes have only one member. Check "classcol"


When I attempt to run a small subsection of the training and test set(200 and 70 observations respectively) I get the following error:

kdata_strain = kdata.training[1:200,]
kdata_stest = kdata.test[1:70,]
kdata_pred = knncat(train = kdata_strain, test = kdata_stest, classcol = 4)



Error in knncat(train = kdata_strain, test = kdata_stest, classcol = 4) :
Some factor has empty levels


Here is the str method called on kdataf, the dataframe for which the above data was sampled for:

str(kdataf)
'data.frame': 1159712 obs. of 9 variables:
$ vehicle_sales_price: num 13495 11999 14499 12495 14999 ...
$ week_number: Factor w/ 27 levels "1","2","3","4",..: 11 10 13 10 10 9 18 10 10 10 ...
$ county: Factor w/ 219 levels "Anderson","Andrews",..: 49 49 49 49 49 49 49 49 49 49 ...
$ ownership_code : Factor w/ 23 levels "1","2","3","4",..: 11 11 3 1 11 11 11 11 11 11 ...
$ X30_days_late : Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 1 ...
$ X60_days_late : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 2 1 1 1 ...
$ penalty : num 0 0 55.3 0 0 ...
$ processing_time : int 28 24 32 29 19 20 63 27 28 24 ...
$ transaction_code : Factor w/ 2 levels "TITLE","WDTA": 2 2 2 2 2 2 2 2 2 2 ...


The seed was set to '1234' and the ratio of the training to test data was 2:1

Answer

First, I know very little about R, so take my answer with a grain of salt. I had the same problem, that made no sense, because there were no NAs. I thought at the beginning that it were strange characters like ', /, etc that I had in my data. But no, the knncat algorithm works with those characters when I put the following three lines of code after defining my train sets (i use data.table because my data are huge):

write.csv(train, file="train.csv")
train <- fread("train.csv", sep=",", header=T, stringsAsFactors=T)
train[,V1:=NULL]

Then, there are no more messages 'Some factor has empty levels' or 'Some classes have only one member. Check "classcol"'. I know this is not a real solution to the problem, but at least, you can finish your work. Hope it helps.