D hagen D hagen - 5 months ago 73
Python Question

Using first row in Pandas groupby dataframe to calculate cumulative difference

I have the following grouped dataframe based on daily data

Studentid Year Month BookLevel

JSmith 2015 12 1.4
2016 1 1.6
2 1.8
3 1.2
4 2.0

MBrown 2016 1 3.0
2 3.2
3 3.6


I want to calculate the difference from the starting point in BookLevel for each Studentid. The current BookLevel is a .max calculation from the GroupBy to get the highest bookLevel for each month for each student

What I am looking for is something like this:

Studentid Year Month BookLevel Progress Since Start

JSmith 2015 12 1.4 0 (or NAN)
2016 1 1.6 .2
2 1.8 .4
3 1.2 -.2
4 2.0 .6

2016 1 3.0 0 (or NAN)
MBrown 2 3.2 .2
3 3.6 .6


I'm new to Python/Pandas and have tried a number of things and nothing comes close.

Answer

OK, this should work, if we groupby on the first level and subtract BookLevel from the series returned by calling transform with first then we can add this as the new desired column:

In [47]:
df['ProgressSinceStart'] = df['BookLevel'] - df.groupby(level='Studentid')['BookLevel'].transform('first')
df

Out[47]:
                      BookLevel  ProgressSinceStart
Studentid Year Month                               
JSmith    2015 12           1.4                 0.0
          2016 1            1.6                 0.2
               2            1.8                 0.4
               3            1.2                -0.2
               4            2.0                 0.6
MBrown    2016 1            3.0                 0.0
               2            3.2                 0.2
               3            3.6                 0.6
Comments