Huey - 11 months ago 118

Python Question

I'm looking at the tutorials on window functions, but I don't quite understand why the following code produces NaNs.

If I understand correctly, the code creates a rolling window of size 2. Why do the first, fourth, and fifth rows have NaN? At first, I thought it's because adding NaN with another number would produce NaN, but then I'm not sure why the second row wouldn't be NaN.

`dft = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]},`

index=pd.date_range('20130101 09:00:00', periods=5, freq='s'))

In [58]: dft.rolling(2).sum()

Out[58]:

B

2013-01-01 09:00:00 NaN

2013-01-01 09:00:01 1.0

2013-01-01 09:00:02 3.0

2013-01-01 09:00:03 NaN

2013-01-01 09:00:04 NaN

Answer Source

The first thing to notice is that by default `rolling`

looks for n-1 prior rows of data to aggregate, where n is the window size. If that condition is not met, it will return NaN for the window. This is what's happening at the first row. In the fourth and fifth row, it's because one of the values in the sum is NaN.

If you would like to avoid returning NaN, you could pass `min_periods=1`

to the method which reduces the minimum required number of valid observations in the window to 1 instead of 2:

```
>>> dft.rolling(2, min_periods=1).sum()
B
2013-01-01 09:00:00 0.0
2013-01-01 09:00:01 1.0
2013-01-01 09:00:02 3.0
2013-01-01 09:00:03 2.0
2013-01-01 09:00:04 4.0
```