Zanshin - 1 year ago 66
Python Question

# Add in count of values and columns for totals

``````import pandas as pd
import numpy as np

df = pd.DataFrame( {
'A': ['d','d','d','f','f','f','g','g','g','h','h','h'],
'B': [5,5,6,7,5,6,6,7,7,6,7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1],
'S': [2012,2013,2014,2015,2016,2012,2013,2014,2015,2016,2012,2013]
} );

df = (df.B + df.C).groupby([df.A, df.S]).sum().unstack(fill_value=0)
print (df)

S  2012  2013  2014  2015  2016
A
d     6     6     7     0     0
f     7     0     0     8     6
g     0     7     8     8     0
h     8     8     0     0     7
``````

I want to add in the count of values that have been summed in the dataframe per year as well as two additional columns [total of years] and [total count]

EDIT;

``````Dataframe should look something like this;
S  2012 2012 2013 2013 2014 2014 2015  2015 Tot(sum) Tot(#)
A
d     6   x    6    x    7    x    0     x     19      x
f     7   x    0    x    0    x    8     x     15      x
g     0   x    7    x    8    x    8     x     23      x
h     8   x    8    x    0    x    0     x     16      x
``````

I think you can use `aggregate` `sum` and `size`:

``````df = (df.B + df.C).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)
print (df)
sum                     size
S 2012 2013 2014 2015 2016 2012 2013 2014 2015 2016
A
d    6    6    7    0    0    1    1    1    0    0
f    7    0    0    8    6    1    0    0    1    1
g    0    7    8    8    0    0    1    1    1    0
h    8    8    0    0    7    1    1    0    0    1
``````

Then `groupby` by first level of columns and get `sum`, add level `total` to columns for `MultiIndex`:

``````df1 = df.groupby(level=0, axis=1).sum()
new_cols= list(zip(df1.columns.get_level_values(0),['total'] * len(df.columns)))
df1.columns = pd.MultiIndex.from_tuples(new_cols)
print (df1)
sum  size
total total
A
d    19     3
f    21     3
g    23     3
h    23     3
``````

Last `concat` both `DataFrames` and sort columns by `sort_index`:

``````df2 = pd.concat([df,df1], axis=1).sort_index(axis=1)
df2.loc['total'] = df2.sum()
print (df2)
size                            sum
S     2012 2013 2014 2015 2016 total 2012 2013 2014 2015 2016 total
A
d        1    1    1    0    0     3    6    6    7    0    0    19
f        1    0    0    1    1     3    7    0    0    8    6    21
g        0    1    1    1    0     3    0    7    8    8    0    23
h        1    1    0    0    1     3    8    8    0    0    7    23
total    3    3    2    2    2    12   21   21   15   16   13    86
``````

Another posible solution is `pivot_table`:

``````df['D'] = df.B + df.C
print (df.pivot_table(index='A',
columns='S',
values='D',
aggfunc=[np.sum, len],
fill_value=0,
margins=True,
margins_name='Total'))

sum                                len
S      2012  2013  2014  2015  2016 Total 2012 2013 2014 2015 2016 Total
A
d       6.0   6.0   7.0   0.0   0.0  19.0  1.0  1.0  1.0  0.0  0.0   3.0
f       7.0   0.0   0.0   8.0   6.0  21.0  1.0  0.0  0.0  1.0  1.0   3.0
g       0.0   7.0   8.0   8.0   0.0  23.0  0.0  1.0  1.0  1.0  0.0   3.0
h       8.0   8.0   0.0   0.0   7.0  23.0  1.0  1.0  0.0  0.0  1.0   3.0
Total  21.0  21.0  15.0  16.0  13.0  86.0  3.0  3.0  2.0  2.0  2.0  12.0
``````

Also if need convert values to `int`:

``````print (df.pivot_table(index='A',
columns='S',
values='D',
aggfunc=[np.sum, len],
fill_value=0,
margins=True,
margins_name='Total')
.astype(int))
sum                            len
S     2012 2013 2014 2015 2016 Total 2012 2013 2014 2015 2016 Total
A
d        6    6    7    0    0    19    1    1    1    0    0     3
f        7    0    0    8    6    21    1    0    0    1    1     3
g        0    7    8    8    0    23    0    1    1    1    0     3
h        8    8    0    0    7    23    1    1    0    0    1     3
Total   21   21   15   16   13    86    3    3    2    2    2    12
``````

``````df2 = pd.concat([df,df1], axis=1).sort_index(axis=1).sort_index(axis=1, level=1)
print (df2)
size  sum size  sum size  sum size  sum size  sum  size   sum
S 2012 2012 2013 2013 2014 2014 2015 2015 2016 2016 total total
A
d    1    6    1    6    1    7    0    0    0    0     3    19
f    1    7    0    0    0    0    1    8    1    6     3    21
g    0    0    1    7    1    8    1    8    0    0     3    23
h    1    8    1    8    0    0    0    0    1    7     3    23

df2.columns = df2.columns.droplevel(0)
print (df2)
S  2012  2012  2013  2013  2014  2014  2015  2015  2016  2016  total  total
A
d     1     6     1     6     1     7     0     0     0     0      3     19
f     1     7     0     0     0     0     1     8     1     6      3     21
g     0     0     1     7     1     8     1     8     0     0      3     23
h     1     8     1     8     0     0     0     0     1     7      3     23
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download