Brian Brian - 1 year ago 84
Python Question

correlation matrix of one dataframe with another

I was reading through the answers to this question. Then question came up on how to calculate the correlations of all columns from one dataframe with all columns from the other dataframe. Since it seemed this question wasn't going to get answered, I wanted to ask it as I need something just like that.

So say I have dataframes


import pandas as pd
import numpy as np

A = pd.DataFrame(np.random.rand(24, 5), columns=list('abcde'))
B = pd.DataFrame(np.random.rand(24, 5), columns=list('ABCDE'))

how do I get a dataframe that looks like this:

pd.DataFrame([], A.columns, B.columns)

a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

But filled with the appropriate correlations?

Answer Source

One way to do it would be:

pd.concat([A, B], axis=1).corr().filter(B.columns).filter(A.columns, axis=0)

enter image description here

A more efficient way would be:

Az = (A - A.mean())
Bz = (B - B.mean()), axis=0)

And you'd get the same as above.