Caroline Caroline - 5 months ago 11
Python Question

Splicing MultiIndex (.loc not working)

I am having trouble with slicing a MultiIndex. I have tried several tecniques by now but escept for one I don't get them to work.
My DataFrame is composed of several rows like this:



means


Out[25]:
Total DL DM
Mouse Genotype Intensity
455 cre s 15.114886 13.626841 16.602930
w 41.419970 33.916706 48.923234

554 wt s 19.348266 13.747603 24.948928
w 41.563015 37.336228 45.789802





What I am trying to do is to splice my df by Genotype.
The most simple way was of course slicing by index e.g.

means[0:2]


but as I have more than one dataframe with more data I am looking for a more elegant way.

means.loc[('cre')]


which has already worked for a identical dataframe (less rows though) doesn't work for this one as it keeps giving me the Keyerror: 'cre'

When I tried using indexers and other approaches I stumbled upon I kept getting the error that 'cre' is not in the index. But the same happened when I tried to go by names instead of levels. I can't figure out why this happens.

It would be great if someone could help me with this! Thanks!

Answer

Solution 1: pd.IndexSlice

df.loc[pd.IndexSlice[:, 'cre', :], :]

                              Total         DL         DM
Mouse Genotype Intensity                                 
455   cre      s          15.114886  13.626841  16.602930
               w          41.419970  33.916706  48.923234

Solution 2: swaplevel and loc

To use loc['cre'], 'cre' needs to be in the first level of the multiindex. Swapping levels fixes this.

df.swaplevel(0, 1).loc['cre']

                     Total         DL         DM
Mouse Intensity                                 
455   s          15.114886  13.626841  16.602930
      w          41.419970  33.916706  48.923234

Solution 3: xs

print df.xs('cre', level=1)

                     Total         DL         DM
Mouse Intensity                                 
455   s          15.114886  13.626841  16.602930
      w          41.419970  33.916706  48.923234

or

print df.xs('cre', level=1, drop_level=False)

                              Total         DL         DM
Mouse Genotype Intensity                                 
455   cre      s          15.114886  13.626841  16.602930
               w          41.419970  33.916706  48.923234

Timing

enter image description here

xs seems to be most performant.

Comments