Daniel - 1 year ago
Python Question

Correlation Matrix: Extract Variables with High R Values

How can I get an output that would list only the variables whose absolute value correlation is greater than .7?

I would like output similar to this:

four: one, three
one: three

Thanks for your time!


import pandas as pd



four one three two
four 1.000000 -0.989949 -0.880830 -0.670820
one -0.989949 1.000000 0.913500 0.632456
three -0.880830 0.913500 1.000000 0.262613
two -0.670820 0.632456 0.262613 1.000000


If all you want is to print it out, this will work:

col_names = y.corr().columns.values

for col, row in (y.corr().abs() > 0.7).iteritems():
    print(col, col_names[row.values])

I get the following results:

Note that this works but it might be slow because the iteritems method converts each row into a series.