I want badges I want badges - 15 days ago 9
Python Question

Python Pandas: Get index of rows which column matches certain value

Given a DataFrame with a column "BoolCol", we want to find the indexes of the DataFrame in which the values for "BoolCol" == True

I currently have the iterating way to do it, which works perfectly:

for i in range(100,3000):
if df.iloc[i]['BoolCol']== True:
print i,df.iloc[i]['BoolCol']


But this is not the correct panda's way to do it.
After some research, I am currently using this code:

df[df['BoolCol'] == True].index.tolist()


This one gives me a list of indexes, but they dont match, when I check them by doing:

df.iloc[i]['BoolCol']


The result is actually False!!

Which would be the correct Pandas way to do this?

Answer

df.iloc[i] returns the ith row of df. i does not refer to the index value, i is a 0-based index.

In contrast, the attribute index returns actual index values, not numeric row-indices:

df[df['BoolCol'] == True].index.tolist()

or equivalently,

df[df['BoolCol']].index.tolist()

You can see the difference quite clearly by playing with a DataFrame with an "unusual" index:

df = pd.DataFrame({'BoolCol': [True, False, False, True, True]},
       index=[10,20,30,40,50])

In [53]: df
Out[53]: 
   BoolCol
10    True
20   False
30   False
40    True
50    True

[5 rows x 1 columns]

In [54]: df[df['BoolCol']].index.tolist()
Out[54]: [10, 40, 50]

If you want to use the index values,

In [56]: idx = df[df['BoolCol']].index.tolist()

In [57]: idx
Out[57]: [10, 40, 50]

then you can select the rows using loc instead of iloc:

In [58]: df.loc[idx]
Out[58]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

Note that loc can also accept boolean arrays:

In [55]: df.loc[df['BoolCol']]
Out[55]: 
   BoolCol
10    True
40    True
50    True

[3 rows x 1 columns]

If you have a boolean array, mask and need ordinal index values, you can compute them using np.flatnonzero:

In [110]: np.flatnonzero(df['BoolCol'])
Out[112]: array([0, 3, 4])

Use df.iloc to select rows by ordinal index:

In [113]: df.iloc[np.flatnonzero(df['BoolCol'])]
Out[113]: 
   BoolCol
10    True
40    True
50    True
Comments