TimelordViktorious - 2 years ago 165

Python Question

I have a dataset with about 100+ features. I also have a small set of covariates.

I build an OLS linear model using statsmodels for y = x + C1 + C2 + C3 + C4 + ... + Cn for each covariate, and a feature x, and a dependent variable y.

I'm trying to perform hypothesis testing on the regression coefficients to test if the coefficients are equal to 0. I figured a t-test would be the appropriate approach to this, but I'm not quite sure how to go about implementing this in Python, using statsmodels.

I know, particularly, that I'd want to use http://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.RegressionResults.t_test.html#statsmodels.regression.linear_model.RegressionResults.t_test

But I am not certain I understand the r_matrix parameter. What could I provide to this? I did look at the examples but it is unclear to me.

Furthermore, I am not interested in doing the t-tests on the covariates themselves, but just the regression co-eff of x.

Any help appreciated!

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Are you sure you don't want `statsmodels.regression.linear_model.OLS`

? This will perform a OLS regression, making available the parameter estimates and the corresponding p-values (and many other things).

```
from statsmodels.regression import linear_model
from statsmodels.api import add_constant
Y = [1,2,3,5,6,7,9]
X = add_constant(range(len(Y)))
model = linear_model.OLS(Y, X)
results = model.fit()
print(results.params) # [ 0.75 1.32142857]
print(results.pvalues) # [ 2.00489220e-02 4.16826428e-06]
```

These p-values are from the t-tests of each fit parameter being equal to 0.

It seems like `RegressionResults.t_test`

would be useful for less conventional hypotheses.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**