Brian Brian - 4 months ago 16
Python Question

correlation matrix of one dataframe with another

I was reading through the answers to this question. Then question came up on how to calculate the correlations of all columns from one dataframe with all columns from the other dataframe. Since it seemed this question wasn't going to get answered, I wanted to ask it as I need something just like that.

So say I have dataframes

A
and
B
:

import pandas as pd
import numpy as np

A = pd.DataFrame(np.random.rand(24, 5), columns=list('abcde'))
B = pd.DataFrame(np.random.rand(24, 5), columns=list('ABCDE'))


how do I get a dataframe that looks like this:

pd.DataFrame([], A.columns, B.columns)

A B C D E
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN


But filled with the appropriate correlations?

Answer

One way to do it would be:

pd.concat([A, B], axis=1).corr().filter(B.columns).filter(A.columns, axis=0)

enter image description here

A more efficient way would be:

Az = (A - A.mean())
Bz = (B - B.mean())

Az.T.dot(Bz).div(len(A)).div(Bz.std(ddof=0)).div(Az.std(ddof=0), axis=0)

And you'd get the same as above.

Comments