zaphod zaphod - 2 months ago 7
Python Question

how to get the average of dataframe column values

A B
DATE
2013-05-01 473077 71333
2013-05-02 35131 62441
2013-05-03 727 27381
2013-05-04 481 1206
2013-05-05 226 1733
2013-05-06 NaN 4064
2013-05-07 NaN 41151
2013-05-08 NaN 8144
2013-05-09 NaN 23
2013-05-10 NaN 10


say i have the dataframe above. what is the easiest way to get a series with the same index which is the average of the columns A and B? the average needs to ignore NaN values. the twist is that this solution needs to be flexible to the addition of new columns to the dataframe.

the closest i have come was

df.sum(axis=1) / len(df.columns)


however, this does not seem to ignore the NaN values

(note: i am still a bit new to the pandas library, so i'm guessing there's an obvious way to do this that my limited brain is simply not seeing)

DSM DSM
Answer

Simply using df.mean() will Do The Right Thing(tm) with respect to NaNs:

>>> df
                 A      B
DATE                     
2013-05-01  473077  71333
2013-05-02   35131  62441
2013-05-03     727  27381
2013-05-04     481   1206
2013-05-05     226   1733
2013-05-06     NaN   4064
2013-05-07     NaN  41151
2013-05-08     NaN   8144
2013-05-09     NaN     23
2013-05-10     NaN     10
>>> df.mean(axis=1)
DATE
2013-05-01    272205.0
2013-05-02     48786.0
2013-05-03     14054.0
2013-05-04       843.5
2013-05-05       979.5
2013-05-06      4064.0
2013-05-07     41151.0
2013-05-08      8144.0
2013-05-09        23.0
2013-05-10        10.0
dtype: float64

You can use df[["A", "B"]].mean(axis=1) if there are other columns to ignore.

Comments