Fidel Mercado Fidel Mercado - 2 months ago 12
R Question

R - getting the error: "Invalid argument: 'sim' & 'obs' doesn't have the same length !"

I've been learning R in R Studio and have been working on simple prediction modeling.

I receive the following error:

Invalid argument: 'sim' & 'obs' doesn't have the same length !

when I run this line of code:

rmse(testingbabydata$weight, predictedWeight)

The dataset linked here contains 1000 rows and the global environment pane shows that my testing data and my training data have "500 obs. of 2 variables" each.

The library
should already be loaded properly too.

This is my code snippet wherein I attempt to predict a baby's weight based on the length of the pregnancy in weeks:

ncbabydata=read.csv("nc.csv",header=TRUE,stringsAsFactors = FALSE`)
model = train(weight ~.,trainingbabydata,method="rf")
rmse(testingbabydata$weight, predictedWeight)

Thank you for your time! (I did attempt to google this error message first but found no suitable source that I could understand relatively easily.)


Your two vectors are, in fact, not the same length:

> length(predictedWeight)
[1] 498
> length(testingbabydata$weight)
[1] 500

The reason for this is that some of your features are NA, and your prediction is simply omitting these rows. Handling missing data in models is a complex topic, but since it's only two rows out of 500, you can just remove them for now and continue your learning:


and you can then calculate your RMSE (which you can also do directly, without a helper):

> sqrt(mean((testingbabydata$weight-predictedWeight)^2))
[1] 1.025823

and you can compare it to a model which always predicts the mean value:

> sqrt(mean((testingbabydata$weight-mean(testingbabydata$weight))^2))
[1] 1.460638