sapo_cosmico sapo_cosmico - 4 months ago 88
Python Question

How to filter a pandas series with a datetime index on the quarter and year

I have a Series, called 'scores', with a datetime index.

I wish to subset it by

quarter
and
year


pseudocode:
series.loc['q2 of 2013']


Attempts so far:

s.dt.quarter



AttributeError: Can only use .dt accessor with datetimelike values


s.index.dt.quarter



AttributeError: 'DatetimeIndex' object has no attribute 'dt'


This works (inspired by this answer), but I can't believe it is the right way to do this in Pandas:


d = pd.DataFrame(s)


d['date'] = pd.to_datetime(d.index)


d.loc[(d['date'].dt.quarter == 2) & (d['date'].dt.year == 2013)]['scores']



I expect there is a way to do this without transforming into a dataset, forcing the index into datetime, and then getting a Series from it.

What am I missing, and what is the elegant way to do this on a Pandas series?

Answer
import numpy as np
import pandas as pd

index = pd.date_range('2013-01-01', freq='M', periods=12)
s = pd.Series(np.random.rand(12), index=index)
print(s)

# 2013-01-31    0.820672
# 2013-02-28    0.994890
# 2013-03-31    0.928376
# 2013-04-30    0.848532
# 2013-05-31    0.122263
# 2013-06-30    0.305741
# 2013-07-31    0.088432
# 2013-08-31    0.647288
# 2013-09-30    0.640308
# 2013-10-31    0.737139
# 2013-11-30    0.233656
# 2013-12-31    0.245214
# Freq: M, dtype: float64

d = pd.Series(s.index, index=s.index)
quarter = (d.dt.quarter.astype(str) + 'Q' + d.dt.year.astype(str))
print(quarter)

# 2013-01-31    1Q2013
# 2013-02-28    1Q2013
# 2013-03-31    1Q2013
# 2013-04-30    2Q2013
# 2013-05-31    2Q2013
# 2013-06-30    2Q2013
# 2013-07-31    3Q2013
# 2013-08-31    3Q2013
# 2013-09-30    3Q2013
# 2013-10-31    4Q2013
# 2013-11-30    4Q2013
# 2013-12-31    4Q2013
# Freq: M, dtype: object

print(s[quarter == '1Q2013'])

# 2013-01-31    0.124398
# 2013-02-28    0.052828
# 2013-03-31    0.126374
# Freq: M, dtype: float64

If you don't want to create a new Series that holds a label for each quarter (e.g., if you only subsetting once), you could even do

print(s[(s.index.quarter == 1) & (s.index.year == 2013)])

# 2013-01-31    0.124398
# 2013-02-28    0.052828
# 2013-03-31    0.126374
# Freq: M, dtype: float64