TheChymera TheChymera - 3 months ago 65
Python Question

calculate linear regression slope matrix (analogous to correlation matrix) - Python/Pandas

Pandas has a really nice function that gives you a correlation matrix Data Frame for your data DataFrame, pd.DataFrame.corr().

The r of a correlation, however, isn't always that informative. Depending on your application the slope of the linear regression might be just as important. Is there any function that can return that for an input matrix or dataframe?

Other than iterating with scipy.stats.linregress(), which would be a pain, I don't see any way to do this?

Answer

Slope of a regression line y=b0 + b1 * x can also be calculated using the correlation coefficient: b1 = corr(x, y) * σx / σy

Using numpy's newaxis to create the σx / σy matrix:

df.corr() * (df.std().values / df.std().values[:, np.newaxis])
Out[59]: 
          A         B         C
A  1.000000 -0.686981  0.252078
B -0.473282  1.000000 -0.263359
C  0.137670 -0.208775  1.000000

where df is:

df
Out[60]: 
   A  B  C
0  5  6  9
1  4  4  2
2  7  3  5
3  4  3  9
4  6  5  3
5  3  8  6
6  2  8  1
7  7  2  7
8  4  1  5
9  1  6  6

And this is for verification:

res = []
for col1, col2 in itertools.product(df.columns, repeat=2):
    res.append(linregress(df[col1], df[col2]).slope)
np.array(res).reshape(3, 3)
Out[72]: 
array([[ 1.        , -0.68698061,  0.25207756],
       [-0.47328244,  1.        , -0.26335878],
       [ 0.1376702 , -0.20877458,  1.        ]])