gavin gavin - 3 months ago 16
Python Question

What is Pandas doing here that my indexes [0] and [1] refer to the same value?

I have a dataframe with these indices and values:

df[df.columns[0]]

1 example

2 example1

3 example2


When I access df[df.columns[0]][2], I get "example1". Makes sense. That's how indices work.

When I access df[df.columns[0]], however, I get "example", and I get example when I access df[df.columns[1]] as well. So for

df[df.columns[0]][0]

df[df.columns[0]][1]


I get "example".

Strangely, I can delete "row" 0, and the result is that 1 is deleted:

gf = df.drop(df.index[[0]])

gf



exampleDF
2 example1

3 example2


But when I delete row 1, then

2 example1


is deleted, as opposed to example.

This is a bit confusing to me; are there inconsistent standards in Pandas regarding row indices, or am I missing something / made an error?

Answer

You are probably causing pandas to switch between .iloc (index based) and .loc (label based) indexing.

All arrays in Python are 0 indexed. And I notice that indexes in your DataFrame are starting from 1. So when you run df[df.column[0]][0] pandas realizes that there is no index named 0, and falls back to .iloc which locates things by array indexing. Therefore it returns what it finds at the first location of the array, which is 'example'.

When you run df[df.column[0]][1] however, pandas realizes that there is a index label 1, and uses .loc which returns what it finds at that label, which again happens to be 'example'.

When you delete the first row, your DataFrame does not have index labels 0 and 1. So when you go to locate elements at those places in the way you are, it does not return None to you, but instead falls back on array based indexing and returns elements from the 0th and 1st places in the array.

To enforce pandas to use one of the two indexing techniques, use .iloc or .loc. .loc is label based, and will raise KeyError if you try df[df.column[0]].loc[0]. .iloc is index based and will return 'example' when you try df[df.column[0]].iloc[0].


Additional note

These commands are bad practice: df[col_label].iloc[row_index]; df[col_label].loc[row_label].

Please use df.loc[row_label, col_label]; or df.iloc[row_index, col_index]; or df.ix[row_label_or_index, col_label_or_index]

See Different Choices for Indexing for more information.