Duffau - 9 months ago 104

Python Question

I have the followig

`pd.DataFrame`

`import pandas as pd`

df = pd.DataFrame({'name': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'],

'x1': [1, 2, 3, 4, 1, 2, 3, 4],

'x2': [4, 3, 2, 1, 4, 3, 2, 1]

})

> df

name x1 x2

0 a 1 4

1 a 2 3

2 a 3 2

3 a 4 1

4 b 1 4

5 b 2 3

6 b 3 2

7 b 4 1

I would like to calculate a rolling mean of

`x1`

`x2`

`window`

`min_periods`

`name`

`mean`

`x1`

`> df.groupby('name').apply(lambda x: pd.rolling_mean(x.shift(1), window=2, min_periods=1))`

x1 x2

0 NaN NaN

1 1.0 4.0

2 1.5 3.5

3 2.5 2.5

4 NaN NaN

5 1.0 4.0

6 1.5 3.5

7 2.5 2.5

Which is perfect, since row 0 and row 4 do not a any data, within each name group, of length 1, and the result should be

`np.nan`

In Pandas 0.19 and later the

`rolling_mean`

`FutureWarning: pd.rolling_mean is deprecated for DataFrame and will be removed in a future version, replace with`

DataFrame.rolling(min_periods=1,center=False,window=2).mean()

`df_shifted = df.groupby('name').apply(lambda x: x.shift(1))`

> df_shifted.groupby('name').rolling(window=2, min_periods=1).mean()

name x1 x2

name

a 1 a 1.0 4.0

2 a 1.5 3.5

3 a 2.5 2.5

b 5 b 1.0 4.0

6 b 1.5 3.5

7 b 2.5 2.5

But this removes the

`nan`

`MultiIndex`

Is there a nice one-line-kind-of-way of solving this while keeping the

`nan`

The method should handle nan's like the 0.18-method. So if

`x1 = [np.nan, 2, 3, 4, 1, 2, 3, 4]`

`np.nan`

`2.0`

`(np.nan + 2)/1 -> 2.0`

`min_periods`

Answer Source