Sosi Sosi -4 years ago 208
Python Question

Replacing values in a 2nd level column on MultiIndex df in Pandas

I was looking into this post which almost solved my problem. However, in my case, I want to work based on the 2nd level of the df, but trying not to specify my 1st level column names explicitly.

Borrowing the original dataframe:

df = pd.DataFrame({('A','a'): [-1,-1,0,10,12],
('A','b'): [0,1,2,3,-1],
('B','a'): [-20,-10,0,10,20],
('B','b'): [-200,-100,0,100,200]})

##df
A B
a b a b
0 -1 0 -20 -200
1 -1 1 -10 -100
2 0 2 0 0
3 10 3 10 100
4 12 -1 20 200


I want to assign
NA
to all columns
a
and
b
where
b<0
. I was selecting them based on:
df.xs('b',axis=1,level=1)<b
, but then I cannot actually perform the replace. However, I have varying 1st level names, so the indexing there cannot be made based on
A
and
B
explicitly, but possibly through
df.columns.values
?

The desired output would be

##df
A B
a b a b
0 -1 0 NA NA
1 -1 1 NA NA
2 0 2 0 0
3 10 3 10 100
4 NA NA 20 200


I appreciate all tips, thank you in advance.

Answer Source

You can use DataFrame.mask with reindex for same index and column names as original DataFrame created by reindex:

mask = df.xs('b',axis=1,level=1) < 0
print (mask)
       A      B
0  False   True
1  False   True
2  False  False
3  False  False
4   True  False

print (mask.reindex(columns = df.columns, level=0))
       A             B       
       a      b      a      b
0  False  False   True   True
1  False  False   True   True
2  False  False  False  False
3  False  False  False  False
4   True   True  False  False

df = df.mask(mask.reindex(columns = df.columns, level=0))
print (df)
      A          B       
      a    b     a      b
0  -1.0  0.0   NaN    NaN
1  -1.0  1.0   NaN    NaN
2   0.0  2.0   0.0    0.0
3  10.0  3.0  10.0  100.0
4   NaN  NaN  20.0  200.0

Edit by OP: I had asked in comments how to consider multiple conditions (e.g. df.xs('b',axis=1,level=1) < 0 OR df.xs('b',axis=1,level=1) being an NA). @Jezrael kindly indicated that if I wanted to do this, I should consider

mask=(df.xs('b',axis=1,level=1) < 0 | df.xs('b',axis=1,level=1).isnull())
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download