Javier Ayres Javier Ayres - 3 months ago 13
Python Question

Partial sums and subtotals with Pandas

I'm trying to achieve a table with subtotals as shown here, but either that code doesn't work with the latest pandas version (0.18.1) or the example is wrong for multiple columns instead of one. My code here results in the following table

2014 2015 2016
project__name person__username activity__name issue__subject
Influenster employee1 Development 161.0 122.0 104.0
Fix bug 22.0 0.0 0.0
Refactor view 0.0 7.0 0.0
Quality assurance 172.0 158.0 161.0
employee2 Development 119.0 137.0 155.0
Quality assurance 193.0 186.0 205.0
employee3 Development Refactor view 0.0 0.0 1.0
Profit tools employee1 Development 177.0 136.0 216.0
Quality assurance 162.0 122.0 182.0
employee2 Development 154.0 168.0 124.0
Quality assurance 130.0 183.0 192.0
Fix bug 22.0 0.0 0.0
All 1312.0 1219.0 1340.0


and my desired output would be something like:

2014 2015 2016
project__name person__username activity__name issue__subject
Influenster employee1 Development 161.0 122.0 104.0
Fix bug 22.0 0.0 0.0
Refactor view 0.0 7.0 0.0
Total xxx xxx xxx
Quality assurance 172.0 158.0 161.0
Total xxx xxx xxx
Total xxx xxx xxx
employee2 Development 119.0 137.0 155.0
Total xxx xxx xxx
Quality assurance 193.0 186.0 205.0
Total xxx xxx xxx
Total xxx xxx xxx
employee3 Development Refactor view 0.0 0.0 1.0
Total xxx xxx xxx
Total xxx xxx xxx
Total xxx xxx xxx
Profit tools employee1 Development 177.0 136.0 216.0
Total xxx xxx xxx
Quality assurance 162.0 122.0 182.0
Total xxx xxx xxx
Total xxx xxx xxx
employee2 Development 154.0 168.0 124.0
Total xxx xxx xxx
Quality assurance 130.0 183.0 192.0
Fix bug 22.0 0.0 0.0
Total xxx xxx xxx
Total xxx xxx xxx
Total xxx xxx xxx
All 1312.0 1219.0 1340.0


Any help on how to achieve this is appreciated.

Answer

Recursive groupby and apply

def append_tot(df):
    if hasattr(df, 'name') and df.name is not None:
        xs = df.xs(df.name)
    else:
        xs = df
    gb = xs.groupby(level=0)
    n = xs.index.nlevels
    name = tuple('Total' if i == 0 else '' for i in range(n))
    tot = gb.sum().sum().rename(name).to_frame().T
    if n > 1:
        sm = gb.apply(append_tot3)
    else:
        sm = gb.sum()
    return pd.concat([sm, tot])

fields = ['project__name', 'person__username',
          'activity__name', 'issue__subject']
append_tot(df.set_index(fields))

enter image description here

Comments