Fidel Mercado - 9 months ago 36

R Question

I've been learning R in R Studio and have been working on simple prediction modeling.

I receive the following error:

Invalid argument: 'sim' & 'obs' doesn't have the same length !

when I run this line of code:

`rmse(testingbabydata$weight, predictedWeight)`

The dataset linked here contains 1000 rows and the global environment pane shows that my testing data and my training data have "500 obs. of 2 variables" each.

The library

`hydroGOF`

This is my code snippet wherein I attempt to predict a baby's weight based on the length of the pregnancy in weeks:

`ncbabydata=read.csv("nc.csv",header=TRUE,stringsAsFactors = FALSE`)`

trainingbabydata=ncbabydata[seq(1,nrow(ncbabydata),2),c("weeks","weight")]

testingbabydata=ncbabydata[seq(2,nrow(ncbabydata),2),c("weeks","weight")]

model = train(weight ~.,trainingbabydata,method="rf")

predictedWeight=predict(model,testingbabydata)

rmse(testingbabydata$weight, predictedWeight)

Thank you for your time! (I did attempt to google this error message first but found no suitable source that I could understand relatively easily.)

Answer Source

Your two vectors are, in fact, not the same length:

```
> length(predictedWeight)
[1] 498
> length(testingbabydata$weight)
[1] 500
```

The reason for this is that some of your features are NA, and your prediction is simply omitting these rows. Handling missing data in models is a complex topic, but since it's only two rows out of 500, you can just remove them for now and continue your learning:

```
testingbabydata<-testingbabydata[complete.cases(testingbabydata),]
```

and you can then calculate your RMSE (which you can also do directly, without a helper):

```
> sqrt(mean((testingbabydata$weight-predictedWeight)^2))
[1] 1.025823
```

and you can compare it to a model which always predicts the mean value:

```
> sqrt(mean((testingbabydata$weight-mean(testingbabydata$weight))^2))
[1] 1.460638
```