user3084006 - 9 months ago 397

Python Question

I know you how to do this in R. But, is there any functions in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring.

For example an matrix df:

`df = pd.DataFrame({'TFD' : ['AA', 'SL', 'BB', 'D0', 'Dk', 'FF'],`

'Snack' : ['1', '0', '1', '1', '0', '0'],

'Trans' : ['1', '1', '1', '0', '0', '1'],

'Dop' : ['1', '0', '1', '0', '1', '1']}).set_index('TFD')

print df

>>>

Dop Snack Trans

TFD

AA 1 1 1

SL 0 0 1

BB 1 1 1

D0 0 1 0

Dk 1 0 0

FF 1 0 1

[6 rows x 3 columns]

would yield:

`Dop Snack Trans`

Dop 0 2 3

Snack 2 0 2

Trans 3 2 0

Since the matrix is mirrored on the diagonal I guess there would be a way to optimize code.

Answer

It's a simple linear algebra, you multiply matrix with its transpose (your example contains strings, don't forget to convert them to integer):

```
>>> df_asint = df.astype(int)
>>> coocc = df_asint.T.dot(df_asint)
>>> coocc
Dop Snack Trans
Dop 4 2 3
Snack 2 3 2
Trans 3 2 4
```

if, as in R answer, you want to reset diagonal, you can use numpy's `fill_diagonal`

:

```
>>> import numpy as np
>>> np.fill_diagonal(coocc.values, 0)
>>> coocc
Dop Snack Trans
Dop 0 2 3
Snack 2 0 2
Trans 3 2 0
```