I have the following OLS model from StatsModels:
X = df['Grade']
y = df['Results']
X = statsmodels.tools.tools.add_constant(X)
mod = sm.OLS(y,X)
results = mod.fit()
You are adding a constant to the regression equation with
X = statsmodels.tools.tools.add_constant(X). So your regressor X has two columns where the first column is a array of ones.
You need to do the same with the regressor that is used in prediction. So, the
1 means include the constant in the prediction. If you use zero instead, then the contribution of the constant (
0 * params) is zero and the prediction is only the slope effect.
The formula interface adds the constant automatically both for the regressor in the model and for the regressor in the prediction. However, with the pandas DataFrame or numpy ndarray interface, the constant needs to be added by the user both for the model and for predict.