Gerrit Gerrit - 3 months ago 8
Python Question

Count occurences in DataFrame

I've a Dataframe in this format:

| Department | Person | Power | ... |
|------------|--------|--------|-----|
| ABC | 1234 | 75 | ... |
| ABC | 1235 | 25 | ... |
| DEF | 1236 | 50 | ... |
| DEF | 1237 | 100 | ... |
| DEF | 1238 | 25 | ... |
| DEF | 1239 | 50 | ... |


What I now want to get is the sum of occurences for each value in the power column. How can I get this from my DataFrame?

| Department | 100 | 75 | 50 | 25 |
|------------|-----|-----|-----|-----|
| ABC | 0 | 1 | 0 | 1 |
| DEF | 1 | 0 | 2 | 1 |

Answer

You can use value_counts with sort_index, then generate DataFrame by to_frame and last transpose by T:

print (df.Power.value_counts().sort_index(ascending=False).to_frame().T)
       100  75   50   25 
Power    1    1    2    2

EDIT by comment:

You need crosstab:

print (pd.crosstab(df.Department, df.Power).sort_index(axis=1, ascending=False))
Power       100  75   50   25 
Department                    
ABC           0    1    0    1
DEF           1    0    2    1

Faster another solution with groupby and unstack:

print (df.groupby(['Department','Power'])
         .size()
         .unstack(fill_value=0)
         .sort_index(axis=1, ascending=False))

Power       100  75   50   25 
Department                    
ABC           0    1    0    1
DEF           1    0    2    1

If need groupby by columns Department and Person, add column Person to groupby to second position (thank you piRSquared):

print (df.groupby(['Department','Person', 'Power'])
         .size()
         .unstack(fill_value=0)
         .sort_index(axis=1, ascending=False))

Power              100  75   50   25 
Department Person                    
ABC        1234      0    1    0    0
           1235      0    0    0    1
DEF        1236      0    0    1    0
           1237      1    0    0    0
           1238      0    0    0    1
           1239      0    0    1    0

EDIT1 by comment:

If need add another missing values, use reindex:

print (df.groupby(['Department','Power'])
         .size()
         .unstack(fill_value=0)
         .reindex(columns=[100,75,50,25,0], fill_value=0))

Power       100  75   50   25   0  
Department                         
ABC           0    1    0    1    0
DEF           1    0    2    1    0
Comments