Gregor Sturm Gregor Sturm - 8 days ago 7
Python Question

Pandas dataframe to count matrix

This must be obvious, but I couldn't find an easy solution.

I have pandas DataFrame like this:

actual | predicted
------ + ---------
Apple | Apple
Apple | Apple
Apple | Banana
Banana | Orange
Orange | Apple


I want this:

| Apple | Banana | Orange
------ + ------- + ------- + -------
Apple | 2 | 1 | 0
Banana | 0 | 0 | 1
Orange | 1 | 0 | 0

Answer

You can use groupby with aggregating size and unstack MultiIndex:

df = df.groupby(['actual','predicted']).size().unstack(fill_value=0)
print (df)
predicted  Apple  Banana  Orange
actual                          
Apple          2       1       0
Banana         0       0       1
Orange         1       0       0

Another solution with crosstab:

df = pd.crosstab(df.actual, df.predicted)
print (df)
predicted  Apple  Banana  Orange
actual                          
Apple          2       1       0
Banana         0       0       1
Orange         1       0       0