Flowpoke Flowpoke - 16 days ago 6
Python Question

How to remove unconverted data from a Python datetime object

I have a database of mostly correct datetimes but a few are broke like so:

Sat Dec 22 12:34:08 PST 20102015


Without the invalid year, this was working for me:

end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))


But once I hit an object with a invalid year I get
ValueError: unconverted data remains: 2
, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6
unconverted characters
.

Any pointers? I would just slice
end_date
but im hoping there is a datetime-safe strategy.

Answer

Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:

end_date = end_date.split(" ")
end_date[-1] = end_date[-1][:4]
end_date = " ".join(end_date)

I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.

You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.

Comments