user3084006 - 5 months ago 287
Python Question

# Constructing a co-occurrence matrix in python pandas

I know you how to do this in R. But, is there any functions in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.

For example an matrix df:

``````df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],
'Snack' : ['1', '0', '1', '1', '0', '0'],
'Trans' : ['1', '1', '1', '0', '0', '1'],
'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')

print df

>>>
Dop Snack Trans
TFD
AA    1     1     1
SL    0     0     1
BB    1     1     1
D0    0     1     0
Dk    1     0     0
FF    1     0     1

[6 rows x 3 columns]
``````

would yield:

``````    Dop Snack Trans

Dop   0     2     3
Snack 2     0     2
Trans 3     2     0
``````

Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.

Answer

It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):

``````>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
Dop  Snack  Trans
Dop      4      2      3
Snack    2      3      2
Trans    3      2      4
``````

if, as in R answer, you want to reset diagonal, you can use numpy's `fill_diagonal`:

``````>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
Dop  Snack  Trans
Dop      0      2      3
Snack    2      0      2
Trans    3      2      0
``````
Comments