cd98 cd98 - 1 month ago 10
Python Question

pandas: selecting array of index labels with .loc

Consider this dataFrame:

df = pd.DataFrame({u'A': {2.0: 2.2,
7.0: 1.4,
8.0: 1.4,
9.0: 2.2}, u'B': {2.0: 7.2,
7.0: 6.3,
8.0: 4.4,
9.0: 5.0}})


Which looks like this:

A B
2 2.2 7.2
7 1.4 6.3
8 1.4 4.4
9 2.2 5.0


I'd like to get indices with label
2
and
7
(numbers, not strings)

df.loc[[2, 7]]


gives an error!

IndexError: indices are out-of-bounds


However,
df.loc[7]
and
df.loc[2]
work fine and as expected. Also, if I define the dataframe index with strings instead of numbers:

df2 = pd.DataFrame({u'A': {'2': 2.2,
'7': 1.4,
'8': 1.4,
'9': 2.2},
u'B': {'2': 7.2,
'7': 6.3,
'8': 4.4,
'9': 5.0}})

df2.loc[['2', '8']]


it works fine.

This is not the behavior I expected from
df.loc
(is it a bug or just a gotcha?)
Can I pass an array of numbers as label indices and not just positions?

I can convert all indices to strings and then operate with
.loc
but it would be very inconvenient for the rest of my code.

Thanks for your time!

Answer

This is a bug in 0.12. Version 0.13 fixes this (IOW, label selection, whether number or string should work when you pass a list).

You could do this (uses an internal method though):

In [10]: df.iloc[df.index.get_indexer([2,7])]
Out[10]: 
     A    B
2  2.2  7.2
7  1.4  6.3