cammil cammil - 9 days ago 6
Python Question

What is the logic of parsing datetime with years and months?

I am not sure why '200011' parses to 2000-11-01 with '%Y%m' as the format when '200013' with '%Y%m' fails and '200011' with '%Y%m%d' succeeds. See code:

>>> datetime.datetime.strptime('200013', '%Y%m')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/python3.5/_strptime.py", line 510, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File ".../lib/python3.5/_strptime.py", line 346, in _strptime
data_string[found.end():])
ValueError: unconverted data remains: 3
>>> datetime.datetime.strptime('200011', '%Y%m')
datetime.datetime(2000, 11, 1, 0, 0)
>>> datetime.datetime.strptime('200011', '%Y%m%d')
datetime.datetime(2000, 1, 1, 0, 0)


Any ideas what's going on?

wim wim
Answer

TL;DR: The Python documentation neglects to mention that the zero-padding month is optional.

>>> from datetime import datetime
>>> pattern = '%Y%m'
>>> datetime.strptime('20161', pattern).strftime(pattern)
'201601'  # Note an extra "0" has appeared

The time formats for strptime and strftime come from the C standard libraries. The Python documentation is somewhat lacking on a few important details here, the relevant section in Python documentation just says:

%m Month as a zero-padded decimal number.

However it is also mentioned

The full set of format codes supported varies across platforms, because Python calls the platform C library’s strftime() function, and platform variations are common.

The behaviour which is causing surprising results here, i.e. the handling of leading zeros, is better documented for C:

%Y The full year {4}; leading zeros shall be permitted but shall not be required. A leading '+' or '-' character shall be permitted before any leading zeros but shall not be required.

%m The month number [01,12]; leading zeros shall be permitted but shall not be required.

%d The day of the month [01,31]; leading zeros shall be permitted but shall not be required.

Emphasis mine. Source here.


So with the knowledge that leading zeros may or may not be present, all the cases mentioned are correctly accounted for:

datetime.strptime('200013', '%Y%m')  # Can not parse

Since 13 is not a valid month, parsing is forced to take 1 is the month with leading zeros omitted. Then you get ValueError because the parser did not know what to do with the extra data "3".

datetime.datetime.strptime('200011', '%Y%m')  # Parses to 1st Nov

The parser takes November (11) as the month. Day just defaults to 1. It is not possible to take January as the month here, because that would leave extra data unaccounted for with this pattern - there would be an extra trailing '1' leftover. Therefore, the parser must be greedy and consume '11' for the month.

datetime.datetime.strptime('200011', '%Y%m%d')  # Parses to 1st Jan

Here we see that '200011' can be successfully parsed by either pattern %Y%m and %Y%m%d. If you parse with %Y%m%d pattern, then you are forced to take month as January (1) otherwise there is no remaining data to fill %d. Note that leading zeros are also optional for %d.