tpayne - 9 months ago 84
R Question

# Random Forest Prediction with Lags

library(randomForest)
library(dyn)
set.seed(123)
tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))
tz[10, "Y"] <- NA
rr <- tz
rr<-cbind(`lag(Y, -1)` = lag(rr\$Y, -1),rr)
fit <- dyn\$randomForest(Y ~ lag(Y,-1) +x , tz, subset = seq_len(10-1))
pred <-predict(fit, newdata=rr)

I am trying to get the random forest to predict the 10th observation, however it keeps coming back as NA. I think it has something to do with the lag value, but am not sure how this works. Anyone know how to make this work?

I think you were adding an unnecessary line of code.

set.seed(123)
tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))

tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))

rr <- tz
rr<-cbind(`lag(Y, -1)` = lag(rr\$Y, -1),rr)
fit <- dyn\$randomForest(Y ~ lag(Y,-1) +x , tz, subset = seq_len(10-1))
predict(fit, newdata=rr)
1           2           3           4           5           6           7           8           9          10
0.65469597  0.63274585  0.52821489 -0.58116470 -0.28673507  0.73862391 -0.31800427 -0.59019492 -0.34942432 -0.02772214

That extra line is tz[10, "Y"] <- NA. If you remove that, like in above, the 10th element is predicted.

Source (Stackoverflow)