LateCoder LateCoder -4 years ago 187
Python Question

pandas rolling apply doesn't do anything

I have a DataFrame like this:

df2 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
'value': ['a', 'b', 'a']})

date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a


I'm trying to understand how to apply a custom rolling function to it. I've tried doing this:

df2.rolling(2).apply(lambda x: 1)


But this gives me the original DataFrame back:

date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a


If I have a different DataFrame, like this:

df3 = pd.DataFrame({'a': [1, 2, 3], 'value': [4, 5, 6]})


The same rolling apply seems to work:

df3.rolling(2).apply(lambda x: 1)

a value
0 NaN NaN
1 1.0 1.0
2 1.0 1.0


Why doesn't this work for the first DataFrame?

Pandas version: 0.20.2

Python version: 2.7.10

Update

So, I've realized that
df2
's columns are object-type, whereas the output of my lambda function is an integer.
df3
's columns are both integer columns. I'm assuming that this is why the
apply
isn't working.

The following doesn't work:

df2.rolling(2).apply(lambda x: 'a')
date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a


Furthermore, say I want to concatenate the characters in the
value
column on a rolling basis, so that the output of the lambda function is a string, rather than an integer. The following also doesn't work:

df2.rolling(2).apply(lambda x: '.'.join(x))

date value
0 2015-01-01 a
1 2015-01-02 b
2 2015-01-03 a


What's going on here? Can rolling operations be applied to object-type columns in pandas?

Answer Source

Here is one way this could be approached. Noting that rolling is a wrapper for numpy methods and the efficiency associated with those, this is not that. This merely provides a similiar api, to allow rolling on non-numeric columns:

Code:

import pandas as pd

class MyDataFrame(pd.DataFrame):

    @property
    def _constructor(self):
        return MyDataFrame

    def rolling_object(self, window, column, default):
        return pd.concat(
            [self[column].shift(i) for i in range(window)],
            axis=1).fillna(default).T

This creates a custom dataframe class that has a rolling_object method. It does not well match the pandas way in that it only operates on a single column at a time.

Test Code:

df2 = MyDataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
                   'value': ['a', 'b', 'c'],
                   'num': [1, 2, 3]
                   })

print(df2.rolling_object(2, 'value', '').apply(lambda x: '.'.join(x)))

Results:

0     a.
1    b.a
2    c.b
dtype: object
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download