Vinayak Vinayak - 1 month ago 55
R Question

Error in Knn, 'train' and 'class' have different lengths - R Code

I'm trying to use ken on my dataset having 65499 rows and 6 column

My dataset:

> dput(head(sampleknn))
structure(list(RequestorSeniority = c(1L, 2L, 2L, 4L, 1L, 4L),
ITOwner = c(50L, 15L, 15L, 22L, 22L, 38L), Severity = c(2L,
1L, 2L, 2L, 2L, 2L), Priority = c(0L, 1L, 0L, 0L, 1L, 3L),
daysOpen = c(3L, 5L, 0L, 20L, 1L, 0L), Satisfaction = structure(c(4L,
4L, 3L, 3L, 4L, 3L), .Label = c("Amazing", "Satisfied", "Unknown",
"Unsatisfied"), class = "factor")), .Names = c("RequestorSeniority",
"ITOwner", "Severity", "Priority", "daysOpen", "Satisfaction"
), row.names = c(NA, 6L), class = "data.frame")

>str(sampleknn)
'data.frame': 65499 obs. of 6 variables:
$ RequestorSeniority: int 1 2 2 4 1 4 3 4 2 3 ...
$ ITOwner : int 50 15 15 22 22 38 10 1 14 46 ...
$ Severity : int 2 1 2 2 2 2 2 2 2 2 ...
$ Priority : int 0 1 0 0 1 3 3 0 2 1 ...
$ daysOpen : int 3 5 0 20 1 0 9 15 6 1 ...
$ Satisfaction : Factor w/ 4 levels "Amazing","Satisfied",..: 4 4 3 3 4 3 3 3 4 4 ...


Now I'm trying to use knn on this dataset (code below) and it gives me the following error:


Error in knn(train = sampleknn_train, test = sampleknn_test, cl =
sampleknn_test_target, : 'train' and 'class' have different
lengths


Code:

sampleknn <- read.csv(file="HelpDesk.csv",head=TRUE,sep=",")
str(sampleknn)
#---scaling
normalize <- function(x) {
return((x-min(x))/(max(x)-min(x)))
}

sampleknn_n <- as.data.frame(lapply(sampleknn[ ,c(1,2,3,4,5)], normalize))
str(sampleknn_n)

#train the data from sampleknn_n
sampleknn_train <- sampleknn_n[1:65000, ]
#create a test dataset
sampleknn_test <- sampleknn_n[65001:65499, ]
#isolate test and train satisfaction levels
sampleknn_train_target <- sampleknn[1:65000, 6]
sampleknn_test_target <- sampleknn[65001:65499, 6]

#-----knn model
sqrt(65499)
m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_test_target,k=255)


Now, when i run the last line (m1 <-...) it gives me the error 'train' and 'class' have different lengths. I tried looking for answers which talks about the same issue but nothing seems to work for me. What is the fix for this issue? Kindly let me know if you need more information

Edit:

Before Normalization:

RequestorSeniority ITOwner Severity Priority daysOpen Satisfaction
1 50 2 0 3 Unsatisfied
2 15 1 1 5 Unsatisfied
2 15 2 0 0 Unknown
4 22 2 0 20 Unknown
1 22 2 1 1 Unsatisfied
4 38 2 3 0 Unknown


After Normalisation:

RequestorSeniority ITOwner Severity Priority daysOpen
0.0000000000 1.0000000000 0.50 0.0000000000 0.05555555556
0.3333333333 0.2857142857 0.25 0.3333333333 0.09259259259
0.3333333333 0.2857142857 0.50 0.0000000000 0.00000000000
1.0000000000 0.4285714286 0.50 0.0000000000 0.37037037037
0.0000000000 0.4285714286 0.50 0.3333333333 0.01851851852
1.0000000000 0.7551020408 0.50 1.0000000000 0.00000000000

> dput(head(sampleknn_n))
structure(list(RequestorSeniority = c(0, 0.333333333333333, 0.333333333333333,
1, 0, 1), ITOwner = c(1, 0.285714285714286, 0.285714285714286,
0.428571428571429, 0.428571428571429, 0.755102040816326), Severity = c(0.5,
0.25, 0.5, 0.5, 0.5, 0.5), Priority = c(0, 0.333333333333333,
0, 0, 0.333333333333333, 1), daysOpen = c(0.0555555555555556,
0.0925925925925926, 0, 0.37037037037037, 0.0185185185185185,
0)), .Names = c("RequestorSeniority", "ITOwner", "Severity",
"Priority", "daysOpen"), row.names = c(NA, 6L), class = "data.frame")

Answer

From ?knn:

cl        factor of true classifications of training set

therefore you should write your statement:

m1 <- knn(train=sampleknn_train, test=sampleknn_test, cl=sampleknn_train_target,k=255)
Comments