alexshchep alexshchep - 2 months ago 26
Python Question

Pandas Dataframe delete row with certain value until that value changes

I have a dataframe with zeros at the top of the dataframe. These zeroes act as NAs. I would like to delete them until other values begin to appear.

So, I would like this dataframe:

df_
Out[114]:
A B C
2016-08-27 -0.263963 0.000000 0.693514
2016-08-28 -0.085663 0.000000 -0.715981
2016-08-29 1.408283 0.000000 2.513716
2016-08-30 -0.591532 0.000000 -1.468227
2016-08-31 -0.973261 0.000000 0.848670
2016-09-01 0.694384 -0.214615 0.561752
2016-09-02 -1.468527 0.259413 1.195574
2016-09-03 -1.471785 0.006788 0.688078
2016-09-04 -0.817770 0.453037 0.632851
2016-09-05 1.129863 0.000000 -0.296562


to drop just the top 5 rows, but keep the rest (including the last ones) because column 'B' contains zeros in the first five rows.

Answer

You can test if all rows are not equal to 0 with all and axis=1, we use this to mask the df and call first_valid_index and use this to slice the df:

In [40]:
df.loc[df[(df != 0).all(axis=1)].first_valid_index():]

Out[40]:
                   A         B         C
2016-09-01  0.694384 -0.214615  0.561752
2016-09-02 -1.468527  0.259413  1.195574
2016-09-03 -1.471785  0.006788  0.688078
2016-09-04 -0.817770  0.453037  0.632851
2016-09-05  1.129863  0.000000 -0.296562

here is the output from the inner test:

In [37]:
(df != 0).all(axis=1)

Out[37]:
2016-08-27    False
2016-08-28    False
2016-08-29    False
2016-08-30    False
2016-08-31    False
2016-09-01     True
2016-09-02     True
2016-09-03     True
2016-09-04     True
2016-09-05    False
dtype: bool
Comments