TheChymera - 1 year ago 125

Python Question

Pandas has a really nice function that gives you a correlation matrix Data Frame for your data DataFrame, pd.DataFrame.corr().

The r of a correlation, however, isn't always that informative. Depending on your application the slope of the linear regression might be just as important. Is there any function that can return that for an input matrix or dataframe?

Other than iterating with scipy.stats.linregress(), which would be a pain, I don't see any way to do this?

Answer Source

Slope of a regression line y=b_{0} + b_{1} * x can also be calculated using the correlation coefficient: b_{1} = corr(x, y) * σ_{x} / σ_{y}

Using numpy's newaxis to create the σ_{x} / σ_{y} matrix:

```
df.corr() * (df.std().values / df.std().values[:, np.newaxis])
Out[59]:
A B C
A 1.000000 -0.686981 0.252078
B -0.473282 1.000000 -0.263359
C 0.137670 -0.208775 1.000000
```

where `df`

is:

```
df
Out[60]:
A B C
0 5 6 9
1 4 4 2
2 7 3 5
3 4 3 9
4 6 5 3
5 3 8 6
6 2 8 1
7 7 2 7
8 4 1 5
9 1 6 6
```

And this is for verification:

```
res = []
for col1, col2 in itertools.product(df.columns, repeat=2):
res.append(linregress(df[col1], df[col2]).slope)
np.array(res).reshape(3, 3)
Out[72]:
array([[ 1. , -0.68698061, 0.25207756],
[-0.47328244, 1. , -0.26335878],
[ 0.1376702 , -0.20877458, 1. ]])
```