cd98 cd98 - 1 year ago 64
Python Question

pandas: selecting array of index labels with .loc

Consider this dataFrame:

df = pd.DataFrame({u'A': {2.0: 2.2,
7.0: 1.4,
8.0: 1.4,
9.0: 2.2}, u'B': {2.0: 7.2,
7.0: 6.3,
8.0: 4.4,
9.0: 5.0}})


Which looks like this:

A B
2 2.2 7.2
7 1.4 6.3
8 1.4 4.4
9 2.2 5.0


I'd like to get indices with label
2
and
7
(numbers, not strings)

df.loc[[2, 7]]


gives an error!

IndexError: indices are out-of-bounds


However,
df.loc[7]
and
df.loc[2]
work fine and as expected. Also, if I define the dataframe index with strings instead of numbers:

df2 = pd.DataFrame({u'A': {'2': 2.2,
'7': 1.4,
'8': 1.4,
'9': 2.2},
u'B': {'2': 7.2,
'7': 6.3,
'8': 4.4,
'9': 5.0}})

df2.loc[['2', '8']]


it works fine.

This is not the behavior I expected from
df.loc
(is it a bug or just a gotcha?)
Can I pass an array of numbers as label indices and not just positions?

I can convert all indices to strings and then operate with
.loc
but it would be very inconvenient for the rest of my code.

Thanks for your time!

Answer Source

This is a bug in 0.12. Version 0.13 fixes this (IOW, label selection, whether number or string should work when you pass a list).

You could do this (uses an internal method though):

In [10]: df.iloc[df.index.get_indexer([2,7])]
Out[10]: 
     A    B
2  2.2  7.2
7  1.4  6.3