F.N.B F.N.B - 4 months ago 67
Python Question

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels:

import numpy as np
import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

#measurements genre
nmuestra = 100

x = np.linspace(0, 10, nmuestra)
e = np.random.normal(size=nmuestra)
y = 1 + 0.5*x + 2*e
X = sm.add_constant(x)

re = sm.OLS(y, X).fit()
print re.summary() #print the result type Stata

prstd, iv_l, iv_u = wls_prediction_std(re)


My questions are, iv_l and iv_u are the upper and lower confidence intervals or prediction intervals?? How I get the other?? (I need the confidence and prediction intervals for all point, to do as plot)

Answer

iv_l, iv_u gives you the limits of the prediction interval for each point.

see the first plot here http://statsmodels.sourceforge.net/devel/examples/generated/example_ols.html

Prediction interval is the confidence interval for an observation and includes the estimate of the error.

I think, confidence interval for the mean prediction is not yet available in statsmodels. (Actually, the confidence interval for the fitted values is hiding inside the summary_table of influence_outlier, but I need to verify this.)

Proper prediction methods for statsmodels are on the TODO list.

Addition

Confidence intervals are there for OLS but the access is a bit clumsy.

To be included after running your script:

from statsmodels.stats.outliers_influence import summary_table

st, data, ss2 = summary_table(re, alpha=0.05)

fittedvalues = data[:,2]
predict_mean_se  = data[:,3]
predict_mean_ci_low, predict_mean_ci_upp = data[:,4:6].T
predict_ci_low, predict_ci_upp = data[:,6:8].T

# check we got the right things
print np.max(np.abs(re.fittedvalues - fittedvalues))
print np.max(np.abs(iv_l - predict_ci_low))
print np.max(np.abs(iv_u - predict_ci_upp))

plt.plot(x, y, 'o')
plt.plot(x, fittedvalues, '-', lw=2)
plt.plot(x, predict_ci_low, 'r--', lw=2)
plt.plot(x, predict_ci_upp, 'r--', lw=2)
plt.plot(x, predict_mean_ci_low, 'r--', lw=2)
plt.plot(x, predict_mean_ci_upp, 'r--', lw=2)
plt.show()

enter image description here

This should give the same results as SAS, http://jpktd.blogspot.ca/2012/01/nice-thing-about-seeing-zeros.html