Luis Miguel Luis Miguel - 16 days ago 11
Python Question

slicing specific records from a pandas df core panel

I have a pandas dataframe core panel (data_r3000) that contains stock data for several industrial sectors...

{'capital_goods': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13820 (major_axis) x 423 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1962-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: A to ZEUS, 'consumer': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 11832 (major_axis) x 94 (minor_axis)
Items axis: OPEN to ADJ_CLOSE
Major_axis axis: 1970-01-02 00:00:00 to 2016-11-18 00:00:00
Minor_axis axis: ABG to WSO, 'consumer_non_durables': <class 'pandas.core.panel.Panel'>
Dimensions: 6 (items) x 13819 (major_axis) x 138 (minor_axis)


etc. I isolate one of the sectors, where I want to do some modifications to some of the values in the df.

x = data_r3000['capital_goods'].to_frame().unstack(level=1)


This produces the following df:

enter image description here

I have very little experience working with multiindex in pandas, and I ma having problems isolating the 'CLOSE' and 'ADJ_CLOSE' records for 'AA'. How can I isolate those records so I create a AA_df that cointans the timer-series of OPEN and ADJ_CLOSE only?

I have tried
x.xs(['CLOSE','ADJ_CLOSE'], axis=1),
which correctly isolates the two features I am looking for, still, I don't know how to isolate 'AA' only.
Thanks

Answer

I think you can use slicers:

idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])

Or:

print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])

Sample:

cols = pd.MultiIndex.from_product((['ADJ','ADJ_CLOSE', 'CLOSE'],
                                   ['A','AA','AEPI']))
df = pd.DataFrame(np.arange(27).reshape(3,9),columns=cols)

print (df)
  ADJ          ADJ_CLOSE          CLOSE         
    A  AA AEPI         A  AA AEPI     A  AA AEPI
0   0   1    2         3   4    5     6   7    8
1   9  10   11        12  13   14    15  16   17
2  18  19   20        21  22   23    24  25   26

idx = pd.IndexSlice
print (df.loc[:, idx[['CLOSE','ADJ_CLOSE'], 'AA']])
  ADJ_CLOSE CLOSE
         AA    AA
0         4     7
1        13    16
2        22    25

print (df.loc[:, (['CLOSE','ADJ_CLOSE'],'AA')])
  ADJ_CLOSE CLOSE
         AA    AA
0         4     7
1        13    16
2        22    25

Solution with Panel:

np.random.seed(1234)
rng = pd.date_range('1/1/2013',periods=10,freq='D')

data = np.random.randn(10, 4)

cols = ['A','AA','AAON','ABAX']

df1, df2, df3 = pd.DataFrame(data, rng, cols), 
                pd.DataFrame(data, rng, cols), 
                pd.DataFrame(data, rng, cols)

pf = pd.Panel({'OPEN':df1,'ADJ':df2,'ADJ_CLOSE':df3});pf
print (pf)
<class 'pandas.core.panel.Panel'>
Dimensions: 3 (items) x 10 (major_axis) x 4 (minor_axis)
Items axis: ADJ to OPEN
Major_axis axis: 2013-01-01 00:00:00 to 2013-01-10 00:00:00
Minor_axis axis: A to ABAX

print (pf.loc[['OPEN', 'ADJ_CLOSE'], :,'AA'])
                OPEN  ADJ_CLOSE
2013-01-01 -1.190976  -1.190976
2013-01-02  0.887163   0.887163
2013-01-03 -2.242685  -2.242685
2013-01-04 -2.021255  -2.021255
2013-01-05  0.289092   0.289092
2013-01-06 -0.655969  -0.655969
2013-01-07 -0.469305  -0.469305
2013-01-08  1.058969   1.058969
2013-01-09  1.045938   1.045938
2013-01-10 -0.322795  -0.322795
Comments