Paradox Paradox - 19 days ago 6
Python Question

Python Pandas how to find top string which co-occurs?

I have generated a co-occurrence matrix by using the Python pandas library, with the following code:

# dfdo is an ordered dictionary with a key called KEY453

df = pd.DataFrame(dfdo).set_index('KEY453')
df_asint = df.astype(int)
com = df_asint.T.dot(df_asint)


It follows the same procedure as this question.

My question is, how can I find the top 2 strings which co-occur with a given string in the matrix? For example, The top 2 strings that co-occur with Dog in the example below are Cat and Zebra.

Cat Dog Zebra
Cat 0 2 3
Dog 2 0 1
Zebra 3 1 0

Answer

I think you can use nlargest:

print (df.loc['Dog'].nlargest(2))
Cat      2
Zebra    1
Name: Dog, dtype: int64

print (df.loc['Dog'].nlargest(2).index)
Index(['Cat', 'Zebra'], dtype='object')

If need all values of DataFrame use numpy.argsort:

print (np.argsort(-df.values, axis=1)[:, :2])
[[2 1]
 [0 2]
 [0 1]]

print (df.columns[np.argsort(-df.values, axis=1)[:, :2]])
Index([['Zebra', 'Dog'], ['Cat', 'Zebra'], ['Cat', 'Dog']], dtype='object')

print (pd.DataFrame(df.columns[np.argsort(-df.values, axis=1)[:, :2]], 
                               index=df.index, 
                               columns=['first','second']))

       first second
Cat    Zebra    Dog
Dog      Cat  Zebra
Zebra    Cat    Dog

or apply:

print (df.apply(lambda x: pd.Series(x.nlargest(2).index, index=['first','second']), axis=1))
       first second
Cat    Zebra    Dog
Dog      Cat  Zebra
Zebra    Cat    Dog
Comments