I am having some difficulties creating a confusion matrix to compare my model prediction to the actual values. My data set has 159 explanatory variables and my target is called "classe".
df <- read.csv("https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv", na.strings=c("NA","#DIV/0!",""))
#Split into training and validation
index <- createDataPartition(df$classe, times=1, p=0.5)[]
training <- df[index, ]
validation <- df[-index, ]
decisionTreeModel <- rpart(classe ~ ., data=training, method="class", cp =0.5)
pred1 <- predict(decisionTreeModel, validation)
#Check model performance
Error in confusionMatrix.default(validation$classe, pred1) :
The data must contain some levels that overlap the reference.
Your prediction is giving you a matrix of probabilities for each class. If you want to be returned the "winner" (predicted class), replace your predict line with this:
pred1 <- predict(decisionTreeModel, validation, type="class")