heyydrien heyydrien - 2 months ago 18
R Question

How to create a confusion matrix for a decision tree model

I am having some difficulties creating a confusion matrix to compare my model prediction to the actual values. My data set has 159 explanatory variables and my target is called "classe".

#Load Data
df <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv", na.strings=c("NA","#DIV/0!",""))

#Split into training and validation
index <- createDataPartition(df$classe, times=1, p=0.5)[[1]]
training <- df[index, ]
validation <- df[-index, ]

#Model
decisionTreeModel <- rpart(classe ~ ., data=training, method="class", cp =0.5)

#Predict
pred1 <- predict(decisionTreeModel, validation)

#Check model performance
confusionMatrix(validation$classe, pred1)


The following error message is generated from the code above:

Error in confusionMatrix.default(validation$classe, pred1) :
The data must contain some levels that overlap the reference.


I think it may have something to do with the pred1 variable that the predict function generates, it's a matrix with 5 columns while validation$classe is a factor with 5 levels. Any ideas on how to solve this?

Thanks in advance

Joy Joy
Answer

Your prediction is giving you a matrix of probabilities for each class. If you want to be returned the "winner" (predicted class), replace your predict line with this:

pred1 <- predict(decisionTreeModel, validation, type="class")
Comments