Trexion Kameha - 1 year ago 58

Python Question

In python, I have time series data. The key of the data is date and name, and the data has 4 attributes: A, B, C and D.

I need to do some summary data analysis on this dataset:

1) For each name, average of A, B, C and D

2) For each name, standard deviation of A, B, C, and D

3) For each name, count number of NaN's as a percentage of total for each A, B, C, and D

I am familiar with R but not python. If you can point me in the right direction that would be more than enough! Thank you.

`asof_dt = pd.date_range('20151231','20160130')`

df1=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))

df1['name']='alpha'

df2=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))

df2['name']='beta'

df3=pd.DataFrame(np.random.randn(len(asof_dt),4),index=asof_dt,columns=('A','B','C','D'))

df3['name']='gama'

df_total = pd.concat([df1,df2,df3])

df_total[['name','A','B','C']]

Answer Source

What you're looking for is groupBy.

For your example:

```
import pandas as pd
df_total.groupby(['name']).mean()
df_total.groupby(['name']).std()
df_total.groupby(['name']).apply(pd.isnull).sum() / df_total.groupBy(['name']).count()
```