Trexion Kameha Trexion Kameha - 7 months ago 13
Python Question

Python - Summary Statistics using date and name

In python, I have time series data. The key of the data is date and name, and the data has 4 attributes: A, B, C and D.

I need to do some summary data analysis on this dataset:

1) For each name, average of A, B, C and D

2) For each name, standard deviation of A, B, C, and D

3) For each name, count number of NaN's as a percentage of total for each A, B, C, and D

I am familiar with R but not python. If you can point me in the right direction that would be more than enough! Thank you.

asof_dt = pd.date_range('20151231','20160130')
df1=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))
df1['name']='alpha'
df2=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))
df2['name']='beta'
df3=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))
df3['name']='gama'
df_total = pd.concat([df1,df2,df3])
df_total[['name','A','B','C']]

Answer

What you're looking for is groupBy.

For your example:

import pandas as pd

df_total.groupby(['name']).mean()
df_total.groupby(['name']).std()
df_total.groupby(['name']).apply(pd.isnull).sum() / df_total.groupBy(['name']).count()