L. Allart L. Allart - 4 years ago 223
Python Question

Python/Pandas: Unexpected indexes when doing a groupby-apply

This is my first post on Stackoverflow, so... cheers!

I'm using pandas and numpy on Python3 with the following versions:


  • Python 3.5.1 (via Anaconda 2.5.0) 64 bits

  • Pandas 0.19.1

  • numpy 1.11.2 (probably not relevant here)



Here is the minimal code producing the problem:

import pandas as pd
import numpy as np

a = pd.DataFrame({'i' : [1,1,1,1,1], 'a': [1,2,5,6,100], 'b': [2, 4,10, np.nan, np.nan]})
a.set_index(keys='a', inplace=True)
v = a.groupby(level=0).apply(lambda x: x.sort_values(by='i')['b'].rolling(2, min_periods=0).mean())
v.index.names


This code is a simple groupby-apply, but I don't understand the outcome:

FrozenList(['a', 'a'])


For some reason, the index of the result is ['a', 'a'], which seems to be a very doubtful choice from pandas. I would have expected a simple ['a'].

Does anyone have some inputs about why Pandas choose to duplicate the column in the index?

Thanks by advance,

Answer Source

This is happening because sort_values returns a DataFrame or Series so the index is being concatenated to the existing groupby index, the same thing happens if you did shift on the 'b' column:

In [99]:
v = a.groupby(level=0).apply(lambda x: x['b'].shift())
v

Out[99]:
a    a  
1    1     NaN
2    2     NaN
5    5     NaN
6    6     NaN
100  100   NaN
Name: b, dtype: float64

even with as_index=False it would still produce a multi-index:

In [102]:
v = a.groupby(level=0, as_index=False).apply(lambda x: x['b'].shift())
v

Out[102]:
   a  
0  1     NaN
1  2     NaN
2  5     NaN
3  6     NaN
4  100   NaN
Name: b, dtype: float64

if the lambda was returning a plain scalar value then no duplicating index is created:

In [104]:
v = a.groupby(level=0).apply(lambda x: x['b'].max())
v

Out[104]:
a
1       2.0
2       4.0
5      10.0
6       NaN
100     NaN
dtype: float64

I don't think this is a bug rather some semantics to be aware of that some methods will return an object where the index will be aligned with the pre-existing index.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download