tpayne tpayne - 3 months ago 34
R Question

Random Forest Prediction with Lags

library(randomForest)
library(dyn)
set.seed(123)
tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))
tz[10, "Y"] <- NA
rr <- tz
rr<-cbind(`lag(Y, -1)` = lag(rr$Y, -1),rr)
fit <- dyn$randomForest(Y ~ lag(Y,-1) +x , tz, subset = seq_len(10-1))
pred <-predict(fit, newdata=rr)


I am trying to get the random forest to predict the 10th observation, however it keeps coming back as NA. I think it has something to do with the lag value, but am not sure how this works. Anyone know how to make this work?

Answer

I think you were adding an unnecessary line of code.

set.seed(123)
tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))

tz <- zoo(cbind(Y = rnorm(10), x = rnorm(10)))

rr <- tz
rr<-cbind(`lag(Y, -1)` = lag(rr$Y, -1),rr)
fit <- dyn$randomForest(Y ~ lag(Y,-1) +x , tz, subset = seq_len(10-1))
predict(fit, newdata=rr)
  1           2           3           4           5           6           7           8           9          10 
 0.65469597  0.63274585  0.52821489 -0.58116470 -0.28673507  0.73862391 -0.31800427 -0.59019492 -0.34942432 -0.02772214 

That extra line is tz[10, "Y"] <- NA. If you remove that, like in above, the 10th element is predicted.