Patthebug Patthebug - 4 months ago 15
Python Question

Common elements between columns of a DataFrame

I have a

Pandas
DataFrame
that looks like this:

MemberID A B C D
1 0.3 0.5 0.1 0
2 0 0.2 0.9 0.3
3 0.4 0.2 0.5 0.3
4 0.1 0 0 0.7


I would like to have another matrix which gives me the number of non-zero elements for the intersection of every column except for
MemberID
.

For example, the intersection of columns
A
and
B
would be 2 (because
MemberID
1 and 3 have non-zero values for
A
and
B
), intersection of
A
and
C
would be 2 as well (because
MemberID
1 and 3 have non-zero values for
A
and
C
).

The final matrix would look like this:

A B C D
A 3 2 2 2
B 2 3 3 2
C 2 3 3 2
D 2 2 2 3


As we can see, it should be a symmetric matrix, similar to a correlation matrix, but not the correlation matrix.

Intersection of any 2 columns = # of
MemberID
having non-zero values in both columns.


I would show some initial code here but I feel like there would be a simple function to do this task that I don't know of.

Here's the code to create the
DataFrame
:

df = pd.DataFrame([[0.3, 0.5, 0.1, 0],
[0, 0.2, 0.9, 0.3],
[ 0.4, 0.2, 0.5, 0.3],
[ 0.1, 0, 0, 0.7]],
columns=list('ABCD'))


Any pointers would be appreciated. TIA.

Answer

This should to it:

z = (df != 0) * 1
z.T.dot(z)

enter image description here

Comments