Warz Warz - 2 months ago 5
Python Question

Pandas: Use apply to sum row and column on data frame

import datetime
import pandas as pd
import numpy as np

todays_date = datetime.datetime.now().date()
index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D')

columns = ['A','B', 'C']
df = pd.DataFrame(index=index, columns=columns)
df = df.fillna(0) # with 0s rather than NaNs
data = np.array([np.arange(10)]*3).T
df = pd.DataFrame(data, index=index, columns=columns)


Given the df, I would like to group by each 'column' and apply a function that calculates the sum of the values for each date divided by the total for that group (A, B, C)?

Example:

def total_calc(grp):
sum_of_group = np.sum(group)
return sum_of_group


I am trying to use the 'apply' function on my data frame in this fashion but the axis=1 only works on rows and axis=0 works on columns and I want to get both data points for each group?

df.groupby(["A"]).apply(total_calc)


Any ideas?

Answer

I am unsure of your question so I'll guess it. First off I don't like to use integer value so let's transform your df to float

df = df.astype(float)

if you want to divide each element of column A by the sum of column A and vice versa you could do this :

df.div(df.sum(axis=0), axis=1)
Out[24]: 
                   A         B         C
2016-09-24  0.000000  0.000000  0.000000
2016-09-25  0.022222  0.022222  0.022222
2016-09-26  0.044444  0.044444  0.044444
2016-09-27  0.066667  0.066667  0.066667
2016-09-28  0.088889  0.088889  0.088889
2016-09-29  0.111111  0.111111  0.111111
2016-09-30  0.133333  0.133333  0.133333
2016-10-01  0.155556  0.155556  0.155556
2016-10-02  0.177778  0.177778  0.177778
2016-10-03  0.200000  0.200000  0.200000
Comments