durbachit - 8 months ago 30

Python Question

This is really weird. I have tried several ways of dropping rows with missing data from a pandas dataframe, but none of them seem to work.

This is the code (I just uncomment one of the methods used - but these are the three that I used in different modifications - this is the latest):

`import pandas as pd`

Test = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,'NaN',4,5],'C':[1,2,3,'NaT',5]})

print(Test)

#Test = Test.ix[Test.C.notnull()]

#Test = Test.dropna()

Test = Test[~Test[Test.columns.values].isnull()]

print "And now"

print(Test)

But in all cases, all I get is this:

`A B C`

0 1 1 1

1 2 2 2

2 3 NaN 3

3 4 4 NaT

4 5 5 5

And now

A B C

0 1 1 1

1 2 2 2

2 3 NaN 3

3 4 4 NaT

4 5 5 5

Is there any mistake that I am making? or what is the problem? Ideally, I would like to get this:

`A B C`

0 1 1 1

1 2 2 2

4 5 5 5

Answer

Your example DF has `NaN`

and `NaT`

as strings which `.dropna`

, `.notnull`

and co. won't consider falsey, so given your example you can use...

```
df[~df.isin(['NaN', 'NaT']).any(axis=1)]
```

Which gives you:

```
A B C
0 1 1 1
1 2 2 2
4 5 5 5
```

If you had a DF such as (note of the use of `np.nan`

and `np.datetime64('NaT')`

instead of strings:

```
df = pd.DataFrame({'A':[1,2,3,4,5],'B':[1,2,np.nan,4,5],'C':[1,2,3,np.datetime64('NaT'),5]})
```

Then running `df.dropna()`

which give you:

```
A B C
0 1 1.0 1
1 2 2.0 2
4 5 5.0 5
```

Note that column `B`

is now a `float`

instead of an integer as that's required to store `NaN`

values.