Stacey Stacey - 3 months ago 21
Python Question

Conditional Iteration over a dataframe

I have a dataframe

df
which looks like:

id location grain
0 BBG.XETR.AD.S XETR 16.545
1 BBG.XLON.VB.S XLON 6.2154
2 BBG.XLON.HF.S XLON NaN
3 BBG.XLON.RE.S XLON NaN
4 BBG.XLON.LL.S XLON NaN
5 BBG.XLON.AN.S XLON 3.215
6 BBG.XLON.TR.S XLON NaN
7 BBG.XLON.VO.S XLON NaN


In reality this dataframe will be much larger. I would like to iterate over this dataframe returning the
'grain'
value but I am only interested in the rows that have a value (not NaN) in the 'grain' column. So only returning as I iterate over the dataframe the following values:

16.545
6.2154
3.215


I can iterate over the dataframe using:

for staticidx, row in df.iterrows():
value= row['grain']


But this returns a value for all rows including those with a NaN value. Is there a way to either remove the NaN rows from the dataframe or skip the rows in the dataframe where grain equals NaN?

Many thanks

Answer

You can specify a list of columns in dropna on which to subset the data:

subset : array-like Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include

>>> df.dropna(subset=['grain'])
              id location    grain
0  BBG.XETR.AD.S     XETR  16.5450
1  BBG.XLON.VB.S     XLON   6.2154
5  BBG.XLON.AN.S     XLON   3.2150