Duffau - 1 year ago 246
Python Question

# Pandas DataFrame: How to calculate the rolling mean of both grouped and shifted data in version >0.19

I have the followig

`pd.DataFrame`
:

``````import pandas as pd

df = pd.DataFrame({'name': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],
'x1': [1, 2, 3, 4, 1, 2, 3, 4],
'x2': [4, 3, 2, 1, 4, 3, 2, 1]
})

> df
name  x1  x2
0    a   1   4
1    a   2   3
2    a   3   2
3    a   4   1
4    b   1   4
5    b   2   3
6    b   3   2
7    b   4   1
``````

I would like to calculate a rolling mean of
`x1`
and
`x2`
with
`window`
-size of 2 and
`min_periods`
of 1. The mean should be grouped by the
`name`
and the input to the
`mean`
-function should be shifted by one row, that is, the resulting row with index 2, should be calculated from rows (0,1). So for
`x1`
the rolling mean in row 2 should be (1+2)/2 = 1.5.

In Pandas version <= 0.18 I would do this:

``````> df.groupby('name').apply(lambda x: pd.rolling_mean(x.shift(1), window=2, min_periods=1))
x1   x2
0  NaN  NaN
1  1.0  4.0
2  1.5  3.5
3  2.5  2.5
4  NaN  NaN
5  1.0  4.0
6  1.5  3.5
7  2.5  2.5
``````

Which is perfect, since row 0 and row 4 do not a any data, within each name group, of length 1, and the result should be
`np.nan`
.

In Pandas 0.19 and later the
`rolling_mean`
-function and functions alike, are throwing:

``````FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with
DataFrame.rolling(min_periods=1,center=False,window=2).mean()
``````

So in Pandas version >= 0.19 this is the best approach I could come up with:

``````df_shifted = df.groupby('name').apply(lambda x: x.shift(1))
> df_shifted.groupby('name').rolling(window=2, min_periods=1).mean()
name   x1   x2
name
a    1    a  1.0  4.0
2    a  1.5  3.5
3    a  2.5  2.5
b    5    b  1.0  4.0
6    b  1.5  3.5
7    b  2.5  2.5
``````

But this removes the
`nan`
-rows which I would like to keep for array-dimension reasons and returns a DataFrame with
`MultiIndex`
.

Is there a nice one-line-kind-of-way of solving this while keeping the
`nan`
-rows and returning a DataFrame with a flat index?

EDIT
The method should handle nan's like the 0.18-method. So if
`x1 = [np.nan, 2, 3, 4, 1, 2, 3, 4]`
the rolling mean at index 1 should return
`np.nan`
, but the rolling mean at index 2 should return
`2.0`
, since
`(np.nan + 2)/1 -> 2.0`
that is number of non-nan's is less or equal to
`min_periods`
.

To avoid the Deprecation warnings, starting with version `0.19.1`, you can rewrite the syntax as shown:
1. shift the `DF` by 1 level
``````df.groupby('name').apply(lambda x: x.shift().rolling(window=2, min_periods=1).mean()