CF84 CF84 - 10 days ago 6
Python Question

Pandas: drop certain values from a column with a string title

Say I have the following dataframe

df
:

First C Second C Third C
0 0.104000 0.864000 -999
1 0.060337 0.812470 -999
2 0.065797 0.819570 0.802607
3 0.064715 0.817212 0.801755


I want to drop the first two lines because column
Third C
shows two weird values.

df = df.drop(df[df.('Third C') == -999].index)


This throws:

df = df.drop(df[df.('Third C') == -999].index)
^
SyntaxError: invalid syntax


And the same thing happens if I use
df.['Third C']
with square brackets. How can I perform this operation without having to rename my column?

Answer

Use only [] and remove .:

df = df.drop(df[df['Third C'] == -999].index)

But it is better to use boolean indexing:

df = df[df['Third C'] != -999]

Timings:

The drop solution is slower, because it uses boolean indexing and drop:

In [204]: %timeit (df.drop(df[df['Third C'] == -999].index))
1000 loops, best of 3: 691 µs per loop

In [205]: %timeit (df[df['Third C'] != -999])
1000 loops, best of 3: 359 µs per loop