theSage theSage - 2 months ago 5
Python Question

Marking certain days in date indexed pandas dataframes

I've got a dataframe which is date indexed (d-m-y). I wanted to create a binary feature column which denotes if a date is the second Saturday of the month.

What I've got so far is this:

def get_second_true(x):
second = None
for index, is_true in enumerate(x):
if is_true and second is None:
return index
if is_true and second is not None:
second = True

second_saturdays = df.groupby(['month', 'year']).apply(
lambda x: x.index.weekday == 6
).apply(get_second_true)


I'm unable to get this back into a series which relates to the original dataframe's index in such a way that each row has a label of whether it is a second Saturday or not.

This feels like a common enough scenario, but I am unable to find the term used for doing such a thing. I've looked at
unstack
and
reset_index
but I don't understand them in enough depth to know if this can be done using them, or if multilevel indexing is even needed at all.

Answer

There is a special frequency in pandas like WOM-2SUN (Week-Of-Month: 2nd Sunday), so you can do it this way:

In [88]: df = pd.DataFrame({'date':pd.date_range('2000-01-01', periods=365)})

In [89]: df
Out[89]:
          date
0   2000-01-01
1   2000-01-02
2   2000-01-03
3   2000-01-04
4   2000-01-05
5   2000-01-06
6   2000-01-07
7   2000-01-08
8   2000-01-09
9   2000-01-10
..         ...
355 2000-12-21
356 2000-12-22
357 2000-12-23
358 2000-12-24
359 2000-12-25
360 2000-12-26
361 2000-12-27
362 2000-12-28
363 2000-12-29
364 2000-12-30

[365 rows x 1 columns]

In [90]: df.ix[df.date.isin(pd.date_range(start=df.date.min(), end=df.date.max(), freq='WOM-2SUN'))]
Out[90]:
          date
8   2000-01-09
43  2000-02-13
71  2000-03-12
99  2000-04-09
134 2000-05-14
162 2000-06-11
190 2000-07-09
225 2000-08-13
253 2000-09-10
281 2000-10-08
316 2000-11-12
344 2000-12-10
Comments