TheChymera - 1 year ago 218
Python Question

# calculate linear regression slope matrix (analogous to correlation matrix) - Python/Pandas

Pandas has a really nice function that gives you a correlation matrix Data Frame for your data DataFrame, pd.DataFrame.corr().

The r of a correlation, however, isn't always that informative. Depending on your application the slope of the linear regression might be just as important. Is there any function that can return that for an input matrix or dataframe?

Other than iterating with scipy.stats.linregress(), which would be a pain, I don't see any way to do this?

Slope of a regression line y=b0 + b1 * x can also be calculated using the correlation coefficient: b1 = corr(x, y) * σx / σy

Using numpy's newaxis to create the σx / σy matrix:

``````df.corr() * (df.std().values / df.std().values[:, np.newaxis])
Out[59]:
A         B         C
A  1.000000 -0.686981  0.252078
B -0.473282  1.000000 -0.263359
C  0.137670 -0.208775  1.000000
``````

where `df` is:

``````df
Out[60]:
A  B  C
0  5  6  9
1  4  4  2
2  7  3  5
3  4  3  9
4  6  5  3
5  3  8  6
6  2  8  1
7  7  2  7
8  4  1  5
9  1  6  6
``````

And this is for verification:

``````res = []
for col1, col2 in itertools.product(df.columns, repeat=2):
res.append(linregress(df[col1], df[col2]).slope)
np.array(res).reshape(3, 3)
Out[72]:
array([[ 1.        , -0.68698061,  0.25207756],
[-0.47328244,  1.        , -0.26335878],
[ 0.1376702 , -0.20877458,  1.        ]])
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download