Dileep Dileep - 2 months ago 9
Python Question

querying panda df to filter rows where a column is not Nan

I am new to python and using pandas.

I want to query a dataframe and filter the rows where one of the columns is not

NaN
.

I have tried:

a=dictionarydf.label.isnull()


but a is populated with
true
or
false
.
Tried this

dictionarydf.query(dictionarydf.label.isnull())


but gave an error as I expected

sample data:

reference_word all_matching_words label review
0 account fees - account NaN N
1 account mobile - account NaN N
2 account monthly - account NaN N
3 administration delivery - administration NaN N
4 administration fund - administration NaN N
5 advisor fees - advisor NaN N
6 advisor optimum - advisor NaN N
7 advisor sub - advisor NaN N
8 aichi delivery - aichi NaN N
9 aichi pref - aichi NaN N
10 airport biz - airport travel N
11 airport cfo - airport travel N
12 airport cfomtg - airport travel N
13 airport meeting - airport travel N
14 airport summit - airport travel N
15 airport taxi - airport travel N
16 airport train - airport travel N
17 airport transfer - airport travel N
18 airport trip - airport travel N
19 ais admin - ais NaN N
20 ais alpine - ais NaN N
21 ais fund - ais NaN N
22 allegiance custody - allegiance NaN N
23 allegiance fees - allegiance NaN N
24 alpha late - alpha NaN N
25 alpha meal - alpha NaN N
26 alpha taxi - alpha NaN N
27 alpine admin - alpine NaN N
28 alpine ais - alpine NaN N
29 alpine fund - alpine NaN N


I want to filter the data where label is not NaN

expected output:

reference_word all_matching_words label review
0 airport biz - airport travel N
1 airport cfo - airport travel N
2 airport cfomtg - airport travel N
3 airport meeting - airport travel N
4 airport summit - airport travel N
5 airport taxi - airport travel N
6 airport train - airport travel N
7 airport transfer - airport travel N
8 airport trip - airport travel N

Answer

You can use dropna:

df = df.dropna(subset=['label'])

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N

Another solution - boolean indexing with notnull:

df = df[df.label.notnull()]

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N
Comments