AMS AMS - 5 months ago 27
Python Question

PANDAS count on condition

I am trying to tabulate a change in condition using a 'groupby' but am stumped and would appreciate any guidance. I have a data frame as follows:

SUBJECT TYPE
1 1
1 2
1 2
2 1
2 1
3 1
3 3
3 5


I would like to generate a statement that tabulates any positive change, ignores any negative change, and generates a count of change per subject. For example, the output of the above would be:

Subject TYPE
1 1
2 0
3 2


Would I need create an if/else clause using pandas, or is there a simpler way to achieve this using summit? Maybe something like...

def tabchange(type, subject):
current_subject = subject[0]
type_diff = type - type
j = 1
for i in range(1,len(type)):
type_diff[i] = type[i] - type[i-j]
if subject[i] == current_subject:
if type_diff[i] > 0:
new_row = 1
j += 1
else:
j = 1
else:
new_row[i] = 0
current_subject = subject[i]
return new_row

Answer
import pandas as pd
df = pd.DataFrame({'SUBJECT': [1, 1, 1, 2, 2, 3, 3, 3], 
                   'TYPE': [1, 2, 2, 1, 1, 1, 3, 5]})
grouped = df.groupby('SUBJECT')
df['TYPE'] = grouped['TYPE'].diff() > 0
result = grouped['TYPE'].agg('sum')

yields

SUBJECT
1    1.0
2    0.0
3    2.0
Name: TYPE, dtype: float64

Above, df is grouped by SUBJECT and the diff is taken of the TYPE column:

In [253]: grouped = df.groupby('SUBJECT'); df['TYPE'] = grouped['TYPE'].diff() > 0 

In [254]: df
Out[254]: 
   SUBJECT   TYPE
0        1  False
1        1   True
2        1  False
3        2  False
4        2  False
5        3  False
6        3   True
7        3   True

Then, again grouping by SUBJECT, the result is obtained by counting the number of Trues in the TYPE column:

In [255]: result = grouped['TYPE'].agg('sum'); result
Out[255]: 
SUBJECT
1    1.0
2    0.0
3    2.0
Name: TYPE, dtype: float64