TheChymera - 1 year ago 178

Python Question

Pandas has a really nice function that gives you a correlation matrix Data Frame for your data DataFrame, pd.DataFrame.corr().

The r of a correlation, however, isn't always that informative. Depending on your application the slope of the linear regression might be just as important. Is there any function that can return that for an input matrix or dataframe?

Other than iterating with scipy.stats.linregress(), which would be a pain, I don't see any way to do this?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

Slope of a regression line y=b_{0} + b_{1} * x can also be calculated using the correlation coefficient: b_{1} = corr(x, y) * σ_{x} / σ_{y}

Using numpy's newaxis to create the σ_{x} / σ_{y} matrix:

```
df.corr() * (df.std().values / df.std().values[:, np.newaxis])
Out[59]:
A B C
A 1.000000 -0.686981 0.252078
B -0.473282 1.000000 -0.263359
C 0.137670 -0.208775 1.000000
```

where `df`

is:

```
df
Out[60]:
A B C
0 5 6 9
1 4 4 2
2 7 3 5
3 4 3 9
4 6 5 3
5 3 8 6
6 2 8 1
7 7 2 7
8 4 1 5
9 1 6 6
```

And this is for verification:

```
res = []
for col1, col2 in itertools.product(df.columns, repeat=2):
res.append(linregress(df[col1], df[col2]).slope)
np.array(res).reshape(3, 3)
Out[72]:
array([[ 1. , -0.68698061, 0.25207756],
[-0.47328244, 1. , -0.26335878],
[ 0.1376702 , -0.20877458, 1. ]])
```

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**