Mth Clv Mth Clv - 3 months ago 12
Python Question

Select/slice a multi-index dataframe time-series using a period leads to a bug?

I have a multi index which first level index is a time series in the very same way as the following one:

In[168]: rng = pd.date_range('01-01-2000',periods=50,freq='M')

In[169]: long_df = pd.DataFrame(np.random.randn(50,4),index = rng, columns=['bar','baz','foo','zoo'])

In[170]: long_df = long_df.stack()

In[171]: long_df[:10]
Out[171]:

2000-01-31 bar 2.079474
baz -0.569920
foo 1.149012
zoo -0.228926
2000-02-29 bar 0.429502
baz -0.117166
foo 0.956546
zoo -1.483818
2000-03-31 bar -1.137998
baz 1.049849


EDIT

I can slice it using periods and it works fine:

In[172]: long_df = long_df.sort_index()

In[173]: long_df.loc['2001']
Out[173]:
2001-01-31 bar -0.193987
baz 0.769297
foo 0.286880
zoo -1.431313
2001-02-28 bar -0.840502
baz 1.786758
foo 0.878356
zoo 0.433383
2001-03-31 bar 0.897548
baz 1.901540
foo 0.110606
zoo 0.571267
2001-04-30 bar -0.375377
baz 1.423742
foo -0.415006
zoo -0.141000
(...)


However, when I use the multiindex version I am working with the slicing is not being acknowledged:

In[204]: dfmi
Out[204]:
Last Days to expiry
Date Ticker
1988-12-06 HGF89 1.46894 52
HGF90 1.17100 419
HGG89 1.42100 80
HGH89 1.37344 113
HGH90 1.17450 477
HGK89 1.28750 171
HGK90 1.15900 539
HGN89 1.24550 233
HGN90 1.15900 598
HGU89 1.21750 295
HGU90 1.15900 659
HGZ89 1.18500 386
1988-12-07 HGF89 1.51900 51
HGF90 1.18900 418
HGG89 1.46394 79
HGH89 1.41300 112
HGH90 1.19250 476
HGK89 1.31750 170
HGK90 1.17700 538
HGN89 1.27550 232
HGN90 1.17700 597
HGU89 1.24250 294
HGU90 1.17700 658
HGZ89 1.20300 385
1988-12-08 HGF89 1.58100 50
HGF90 1.18900 417
HGG89 1.50894 78
HGH89 1.43994 111
HGH90 1.19250 475
HGK89 1.32750 169
... ...
2016-07-05 HGK7 2.20500 325
HGM7 2.20900 358
HGN6 2.18150 22
HGN7 2.21000 387
HGQ6 2.18150 55
HGQ7 2.21450 420
HGU6 2.18350 85
HGU7 2.21550 449
HGV6 2.18700 114
HGV7 2.21850 479
HGX6 2.19100 146
HGX7 2.22000 511
HGZ6 2.19250 176
2016-07-06 HGF7 2.16700 205
HGG7 2.17100 233
HGH7 2.17100 266
HGJ7 2.17550 294
HGK7 2.17650 324
HGM7 2.18050 357
HGN6 2.15150 21
HGN7 2.18150 386
HGQ6 2.15150 54
HGQ7 2.18600 419
HGU6 2.15350 84
HGU7 2.18700 448
HGV6 2.15700 113
HGV7 2.19000 478
HGX6 2.16100 145
HGX7 2.19150 510
HGZ6 2.16300 175

[167701 rows x 2 columns]

In[204]: dfmi = dfmi.sort_index()

In[205]: dfmi.loc['2001']
Out[206]:
Last Days to expiry
Date Ticker
1988-12-06 HGF89 1.46894 52
HGF90 1.17100 419
HGG89 1.42100 80
HGH89 1.37344 113
HGH90 1.17450 477
HGK89 1.28750 171
HGK90 1.15900 539
HGN89 1.24550 233
HGN90 1.15900 598
HGU89 1.21750 295
HGU90 1.15900 659
1988-12-07 HGF89 1.51900 51
HGF90 1.18900 418
HGG89 1.46394 79
HGH89 1.41300 112
HGH90 1.19250 476
HGK89 1.31750 170
HGK90 1.17700 538
HGN89 1.27550 232
HGN90 1.17700 597
HGU89 1.24250 294
HGU90 1.17700 658
1988-12-08 HGF89 1.58100 50
HGF90 1.18900 417
HGG89 1.50894 78
HGH89 1.43994 111
HGH90 1.19250 475
HGK89 1.32750 169
HGK90 1.17700 537
HGN89 1.27750 231
... ...
2016-07-05 HGH7 2.19950 267
HGJ7 2.20400 295
HGK7 2.20500 325
HGM7 2.20900 358
HGN6 2.18150 22
HGN7 2.21000 387
HGQ6 2.18150 55
HGQ7 2.21450 420
HGU6 2.18350 85
HGU7 2.21550 449
HGV6 2.18700 114
HGV7 2.21850 479
HGX6 2.19100 146
HGX7 2.22000 511
2016-07-06 HGF7 2.16700 205
HGG7 2.17100 233
HGH7 2.17100 266
HGJ7 2.17550 294
HGK7 2.17650 324
HGM7 2.18050 357
HGN6 2.15150 21
HGN7 2.18150 386
HGQ6 2.15150 54
HGQ7 2.18600 419
HGU6 2.15350 84
HGU7 2.18700 448
HGV6 2.15700 113
HGV7 2.19000 478
HGX6 2.16100 145
HGX7 2.19150 510

[161017 rows x 2 columns]


I noticed that there is a difference in type between the long_df (pandas.core.series.Series) I gave as an example and the df (pandas.core.frame.DataFrame) I use

What is the correct way to do it?

Thanks a lot for your tips,

Answer

You need add loc, but need last version of pandas 0.18.1:

print (long_df.loc['2001'])

2001-01-31  bar    1.684425
            baz    1.215258
            foo    0.158968
            zoo    0.689477
2001-02-28  bar   -0.123582
            baz    0.312533
            foo    0.609169
            zoo   -0.093985
2001-03-31  bar    0.372093
            baz   -0.281191
            foo   -0.400354
            zoo    0.646965
2001-04-30  bar   -0.287488
            baz   -0.928941
            foo    1.365416
            zoo    0.267282
2001-05-31  bar   -1.021086
            baz    0.317819
            foo   -0.393135
            zoo   -0.213589
2001-06-30  bar   -2.594173
...
...

EDIT:

Another solution is get_level_values from first level with get_loc for finding integer indexes:

import pandas as pd

long_df = pd.read_csv('test/testslice.csv', parse_dates=[0], index_col=[0,1])
dfmi = long_df.stack().sort_index()

print (dfmi.index.get_level_values(0))
DatetimeIndex(['1988-12-06', '1988-12-06', '1988-12-06', '1988-12-06',
               '1988-12-06', '1988-12-06', '1988-12-06', '1988-12-06',
               '1988-12-06', '1988-12-06',
               ...
               '2016-07-06', '2016-07-06', '2016-07-06', '2016-07-06',
               '2016-07-06', '2016-07-06', '2016-07-06', '2016-07-06',
               '2016-07-06', '2016-07-06'],
              dtype='datetime64[ns]', name='Date', length=335402, freq=None)

print (dfmi.index.get_level_values(0).get_loc('2001'))
slice(121844, 133684, None)
print (dfmi.iloc[dfmi.index.get_level_values(0).get_loc('2001')])

Date        Ticker                
2001-01-02  HGF01   Last                0.8180
                    Days to expiry     27.0000
            HGF02   Last                0.8180
                    Days to expiry    392.0000
            HGG01   Last                0.8165
                    Days to expiry     55.0000
            HGG02   Last                0.8180
                    Days to expiry    420.0000
            HGH01   Last                0.8115
                    Days to expiry     85.0000
            HGH02   Last                0.8180
                    Days to expiry    448.0000
            HGJ01   Last                0.8125
                    Days to expiry    114.0000
            HGJ02   Last                0.8170
                    Days to expiry    479.0000
            HGK01   Last                0.8135
                    Days to expiry    147.0000
            HGK02   Last                0.8160
                    Days to expiry    512.0000
            HGM01   Last                0.8145
                    Days to expiry    176.0000
            HGM02   Last                0.8155
                    Days to expiry    540.0000
            HGN01   Last                0.8155
                    Days to expiry    206.0000
            HGN02   Last                0.8140
                    Days to expiry    573.0000
            HGQ01   Last                0.8160
                    Days to expiry    239.0000
                                        ...   
2001-12-31  HGK03   Last                0.6960
                    Days to expiry    513.0000
            HGM02   Last                0.6680
                    Days to expiry    177.0000
            HGM03   Last                0.6980
                    Days to expiry    542.0000
            HGN02   Last                0.6710
                    Days to expiry    210.0000
            HGN03   Last                0.7005
                    Days to expiry    575.0000
            HGQ02   Last                0.6740
                    Days to expiry    240.0000
            HGQ03   Last                0.7030
                    Days to expiry    604.0000
            HGU02   Last                0.6770
                    Days to expiry    269.0000
            HGU03   Last                0.7050
                    Days to expiry    634.0000
            HGV02   Last                0.6795
                    Days to expiry    302.0000
            HGV03   Last                0.7080
                    Days to expiry    667.0000
            HGX02   Last                0.6820
                    Days to expiry    329.0000
            HGX03   Last                0.7110
                    Days to expiry    694.0000
            HGZ02   Last                0.6850
                    Days to expiry    361.0000
            HGZ03   Last                0.7140
                    Days to expiry    728.0000
dtype: float64
Comments