user1318135 user1318135 - 3 months ago 10
Python Question

How to assign months to their numeric equivalents in Python / Pandas?

Currently, I'm using the following for loop based on an if condition for each month to assign months to their numeric equivalents. It seems to be quite efficient in terms of runtime, but is too manual and ugly for my preferences.

How could this be better executed? I imagine it's possible to improve on it by simplifying/condensing the multiple if conditions somehow, as well as by using some sort of translator that is made for date conversions? Each of which would be preferable?

#make numeric month

combined = combined.sort_values('month')
combined.index = range(len(combined))
combined['month_numeric'] = None

for i in combined['month'].unique():
first = combined['month'].searchsorted(i, side='left')
last = combined['month'].searchsorted(i, side='right')
first_num = list(first)[0] #gives first instance
last_num = list(last)[0] #gives last instance
if i == 'January':
combined['month_numeric'][first_num:last_num] = "01"
elif i == 'February':
combined['month_numeric'][first_num:last_num] = "02"
elif i == 'March':
combined['month_numeric'][first_num:last_num] = "03"
elif i == 'April':
combined['month_numeric'][first_num:last_num] = "04"
elif i == 'May':
combined['month_numeric'][first_num:last_num] = "05"
elif i == 'June':
combined['month_numeric'][first_num:last_num] = "06"
elif i == 'July':
combined['month_numeric'][first_num:last_num] = "07"
elif i == 'August':
combined['month_numeric'][first_num:last_num] = "08"
elif i == 'September':
combined['month_numeric'][first_num:last_num] = "09"
elif i == 'October':
combined['month_numeric'][first_num:last_num] = "10"
elif i == 'November':
combined['month_numeric'][first_num:last_num] = "11"
elif i == 'December':
combined['month_numeric'][first_num:last_num] = "12"

Answer

You can use to_datetime, then month, convert to string and use zfill:

print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))

Sample:

import pandas as pd

df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
      month
0   January
1  February
2  December

print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
0    01
1    02
2    12
Name: month, dtype: object

Another solution is map by dict d:

d = {'January':'01','February':'02','December':'12'}

print (df['month'].map(d))
0    01
1    02
2    12
Name: month, dtype: object

Timings:

df = pd.DataFrame({ 'month': ['January','February', 'December']})
print (df)
df = pd.concat([df]*1000).reset_index(drop=True)

print (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
print (df['month'].map({'January':'01','February':'02','December':'12'}))

In [200]: %timeit (pd.to_datetime(df['month'], format='%B').dt.month.astype(str).str.zfill(2))
100 loops, best of 3: 13.5 ms per loop

In [201]: %timeit (df['month'].map({'January':'01','February':'02','December':'12'}))
1000 loops, best of 3: 462 ┬Ás per loop