rocketman rocketman - 3 months ago 20
R Question

R: Loop structure to use dynamically sized arrays to build linear models

With every iteration of the loop, I'd like to fit a linear model using more historical data and see how, for example, the one-step ahead prediction compares to the actual. The code should be self-explanatory. The problem seems to be that Dependent and Independent are fixed in size after the first iteration (which I'd like to start at 10 data points, as shown in the code), whereas I'd like them to be dynamically sized.

output1 <- rep(0, 127)
output2 <- rep(0, 127)
ret <- function(x, y)
{
for (i in 1:127)
{
Dependent <- y[1:(9+i)]
Independent <- x[1:(9+i)]
fit <- lm(Dependent ~ Independent)
nextInput <- data.frame(Independent = x[(10+i)])
prediction <- predict(fit, nextInput, interval="prediction")
output1[i] <- prediction[2]
output2[i] <- prediction[3]
}
}

Answer

Here's a thought, let me know if I'm close to your intent:

set.seed(42)
n <- 100
x <- rnorm(n)
head(x)
# [1]  1.3709584 -0.5646982  0.3631284  0.6328626  0.4042683 -0.1061245
y <- runif(n)
head(y)
# [1] 0.8851177 0.5171111 0.8519310 0.4427963 0.1578801 0.4423246

ret <- lapply(10:n, function(i) {
  dep <- y[1:i]
  indep <- x[1:i]
  fit <- lm(dep ~ indep)
  pred <- 
    if (i < n) {
      predict(fit, data.frame(indep = x[i+1L]), interval = "prediction")
    } else NULL
  list(fit = fit, pred = pred)
})

Note that I'm making a list of models/predictions instead of using a for loop. Though not exactly the same, this answer does a decent job explaining why this may be a good idea.

Model and prediction from one of the runs:

ret[[50]]
# $fit
# Call:
# lm(formula = dep ~ indep)
# Coefficients:
# (Intercept)        indep  
#     0.44522      0.02691  
# $pred
#         fit        lwr      upr
# 1 0.4528911 -0.1160787 1.021861
summary(ret[[50]]$fit)
# Call:
# lm(formula = dep ~ indep)
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.42619 -0.22178 -0.00004  0.15550  0.53774 
# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)  0.44522    0.03667  12.141   <2e-16 ***
# indep        0.02691    0.03186   0.845    0.402    
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Residual standard error: 0.2816 on 57 degrees of freedom
# Multiple R-squared:  0.01236, Adjusted R-squared:  -0.004966 
# F-statistic: 0.7134 on 1 and 57 DF,  p-value: 0.4018