AMisra AMisra - 4 months ago 26
Python Question

get mean for all columns in a dataframe and create a new dataframe

I have a dataframe with only numeric values and I want to calculate the mean for every column and create a new dataframe.

The original dataframe is indexed by a datetimefield. The new dataframe should be indexed by the same field as original dataframe with a value equal to last row index of original dataframe.

Code so far

mean_series=df.mean()
df_mean= pd.DataFrame(stddev_series)
df_mean.rename(columns=lambda x: 'std_dev_'+ x, inplace=True)


but this gives an error

df_mean.rename(columns=lambda x: 'std_mean_'+ x, inplace=True)
TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('S21') dtype('S21') dtype('S21')

Answer

Your question implies that you want a new DataFrame with a single row.

In [10]: df.head(10)
Out[10]: 
                            0         1         2         3
2011-01-01 00:00:00  0.182481  0.523784  0.718124  0.063792
2011-01-01 01:00:00  0.321362  0.404686  0.481889  0.524521
2011-01-01 02:00:00  0.514426  0.735809  0.433758  0.392824
2011-01-01 03:00:00  0.616802  0.149099  0.217199  0.155990
2011-01-01 04:00:00  0.525465  0.439633  0.641974  0.270364
2011-01-01 05:00:00  0.749662  0.151958  0.200913  0.219916
2011-01-01 06:00:00  0.665164  0.396595  0.980862  0.560119
2011-01-01 07:00:00  0.797803  0.377273  0.273724  0.220965
2011-01-01 08:00:00  0.651989  0.553929  0.769008  0.545288
2011-01-01 09:00:00  0.692169  0.261194  0.400704  0.118335

In [11]: df.tail()
Out[11]: 
                            0         1         2         3
2011-01-03 19:00:00  0.247211  0.539330  0.734206  0.781125
2011-01-03 20:00:00  0.278550  0.534943  0.804949  0.137291
2011-01-03 21:00:00  0.602246  0.108791  0.987120  0.455887
2011-01-03 22:00:00  0.003097  0.436435  0.987877  0.046066
2011-01-03 23:00:00  0.604916  0.670532  0.513927  0.610775


In [12]: df.mean()
Out[12]: 
0    0.495307
1    0.477509
2    0.562590
3    0.447997
dtype: float64

In [13]: new_df = pd.DataFrame(df.mean().to_dict(),index=[df.index.values[-1]])

In [14]: new_df
Out[14]: 
                            0         1        2         3
2011-01-03 23:00:00  0.495307  0.477509  0.56259  0.447997

In [15]: new_df.rename(columns=lambda c: "mean_"+str(c))
Out[15]: 
                       mean_0    mean_1   mean_2    mean_3
2011-01-03 23:00:00  0.495307  0.477509  0.56259  0.447997
Comments