adele adele - 4 months ago 21
Python Question

Delete first row in Dataframe each day for certain value only

Is there a way to delete the first row in a Dataframe, each day, for certain value only. So for example:

2014-03-04 10:00:00 -1.0
2014-03-04 10:04:00 1.0
2014-03-04 10:42:00 -1.0

2014-03-05 09:57:00 1.0
2014-03-05 10:05:00 -1.0
2014-03-05 10:30:00 1.0


For each day above if 1.0 is the first value the row should be deleted. So in the example above this would see row
2014-03-05 10:00:00
deleted.

I can't think of a way to do this without iterating through the dataframe rows using something like
for day in df.index:
which is slow to process a large dataset.

Answer

You can first groupby by DatetimeIndex.year and aggregate head. Then find all first indexes where value of column is 1 by boolean indexing and last drop them:

This solution works nice, if datetimes are not duplicated.

print (df)
                     col
2014-03-04 10:00:00 -1.0
2014-03-04 10:04:00  1.0
2014-03-04 10:42:00 -1.0
2014-03-05 09:57:00  1.0
2014-03-05 10:05:00 -1.0
2014-03-05 10:30:00  1.0

df1 = df.col.groupby(df.index.date).head(1)
print (df1)
2014-03-04 10:00:00   -1.0
2014-03-05 09:57:00    1.0
Name: col, dtype: float64

print (df1[df1 == 1].index)
DatetimeIndex(['2014-03-05 09:57:00'], dtype='datetime64[ns]', freq=None)

print (df.drop(df1[df1 == 1].index))
                     col
2014-03-04 10:00:00 -1.0
2014-03-04 10:04:00  1.0
2014-03-04 10:42:00 -1.0
2014-03-05 10:05:00 -1.0
2014-03-05 10:30:00  1.0