toasteez toasteez - 1 month ago 10
Python Question

Using IndexSlice to filter MultiIndex Dataframes with Pandas

Question: How do I filter the rows such that I return only the rows where the injection is not equal to 0 or NaN and not lose the values of the other columns?

I have a dataframe created with the following code:

import pandas as pd

df=pd.DataFrame(
[
[5777, 100, 5385, 200, 5419, 4887, 100, 200],
[4849, 0, 4539, 0, 3381, 0, 0, ],
[4971, 0, 3824, 0, 4645, 3424, 0, 0, ],
[4827, 200, 3459, 300, 4552, 3153, 100, 200, ],
[5207, 0, 3670, 0, 4876, 3358, 0, 0, ],
],
index=pd.to_datetime(['2010-01-01',
'2010-01-02',
'2010-01-03',
'2010-01-04',
'2010-01-05']),
columns=pd.MultiIndex.from_tuples(
[('Portfolio A', 'GBP', 'amount'),
('Portfolio A', 'GBP', 'injection'),
('Portfolio B', 'EUR', 'amount'), ('Portfolio B', 'EUR', 'injection'),
('Portfolio C', 'USD', 'amount'), ('Portfolio C', 'USD', 'injection'),
('Portfolio D', 'JPY', 'amount'), ('Portfolio D', 'JPY', 'injection')])
).sortlevel(axis=1)


Next I can create a DataFrame with the slice of the data (in this instance it's all the data)

df1=df.loc[pd.IndexSlice[:], pd.IndexSlice[:,:, ['amount', 'injection']]]


Next create a new DataFrame where injection is != 0

df2=df1[df1.loc[pd.IndexSlice[:], pd.IndexSlice[:, :, 'injection']]!=0]


Question: Why does this reset all values in 'amount' column to NaN?

The next step once the amounts are available is to remove rows with all NaN

df3=df2.dropna(axis=0, how='all', thresh=None, subset=None, inplace=False)


The desired output is all data across row indexes:

2010-01-01
2010-01-03
2010-01-04
2010-01-05

Answer

I think you need add fillna with any for check at least one True values if need boolean indexing which works with mask as boolean Series:

print (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0)
           Portfolio A Portfolio B Portfolio C Portfolio D
                   GBP         EUR         USD         JPY
             injection   injection   injection   injection
2010-01-01        True        True        True        True
2010-01-02       False       False       False       False
2010-01-03       False       False        True       False
2010-01-04        True        True        True        True
2010-01-05       False       False        True       False

mask = (df1.loc[:, pd.IndexSlice[:, :, 'injection']].fillna(0) != 0).any(axis=1)
print (mask)
2010-01-01     True
2010-01-02    False
2010-01-03     True
2010-01-04     True
2010-01-05     True
dtype: bool

print (df1[mask])
           Portfolio A           Portfolio B           Portfolio C            \
                   GBP                   EUR                   USD             
                amount injection      amount injection      amount injection   
2010-01-01        5777       100        5385       200        5419      4887   
2010-01-03        4971         0        3824         0        4645      3424   
2010-01-04        4827       200        3459       300        4552      3153   
2010-01-05        5207         0        3670         0        4876      3358   

           Portfolio D            
                   JPY            
                amount injection  
2010-01-01         100     200.0  
2010-01-03           0       0.0  
2010-01-04         100     200.0  
2010-01-05           0       0.0  

If use mask as boolean DataFrame get NaN where False values.