Noobie Noobie - 2 months ago 20
Python Question

Pandas: how to create a year-week variable?

I have a dataframe with datetimes

dates = pd.date_range('9/25/2010', periods=10, freq='D')

df = pd.DataFrame({'col':dates})
df['col']=pd.to_datetime(df['col'])
df['dow'] = df.col.dt.dayofweek
df['week'] = df.col.dt.to_period('W')
df['week_alt']=df.col.dt.year.astype(str) + '-w' + df.col.dt.week.astype(str)

df
Out[21]:
col dow week week_alt
0 2010-09-25 5 2010-09-20/2010-09-26 2010-w38
1 2010-09-26 6 2010-09-20/2010-09-26 2010-w38
2 2010-09-27 0 2010-09-27/2010-10-03 2010-w39
3 2010-09-28 1 2010-09-27/2010-10-03 2010-w39
4 2010-09-29 2 2010-09-27/2010-10-03 2010-w39
5 2010-09-30 3 2010-09-27/2010-10-03 2010-w39
6 2010-10-01 4 2010-09-27/2010-10-03 2010-w39
7 2010-10-02 5 2010-09-27/2010-10-03 2010-w39
8 2010-10-03 6 2010-09-27/2010-10-03 2010-w39
9 2010-10-04 0 2010-10-04/2010-10-10 2010-w40


Here you can see that a week starts on
Monday
and ends on
Sunday
.

I would like to have control over when a week starts. For instance, if weeks now start on Sunday instead, then
2010-09-26
would be
2010-w39
and
2010-10-03
be
2010-w40
.

How can I do that in Pandas?

Answer

UPDATE: you can choose between these three UNIX modifiers: %U,%V,%W:

%U week number of year, with Sunday as first day of week (00..53).

%V ISO week number, with Monday as first day of week (01..53).

%W week number of year, with Monday as first day of week (00..53).

In [189]: df.col.dt.strftime('%U-%V-%W')
Out[189]:
0    38-38-38
1    39-38-38
2    39-39-39
3    39-39-39
4    39-39-39
5    39-39-39
6    39-39-39
7    39-39-39
8    40-39-39
9    40-40-40
Name: col, dtype: object

%U week number of year, with Sunday as first day of week (00..53).

In [190]: df.col.dt.strftime('%Y-w%U')
Out[190]:
0    2010-w38
1    2010-w39
2    2010-w39
3    2010-w39
4    2010-w39
5    2010-w39
6    2010-w39
7    2010-w39
8    2010-w40
9    2010-w40
Name: col, dtype: object

%V ISO week number, with Monday as first day of week (01..53).

In [191]: df.col.dt.strftime('%Y-w%V')
Out[191]:
0    2010-w38
1    2010-w38
2    2010-w39
3    2010-w39
4    2010-w39
5    2010-w39
6    2010-w39
7    2010-w39
8    2010-w39
9    2010-w40
Name: col, dtype: object

OLD answer:

is that what you want?

In [73]: df['week'] = df.date.dt.year.astype(str) + '-w' + df.date.dt.week.astype(str)

In [74]: df
Out[74]:
        date      week
0 2012-01-01  2012-w52
1 2012-01-02   2012-w1
2 2012-02-01   2012-w5