Jan Krüger Jan Krüger - 8 days ago 6
Python Question

Python: Parse Twitter Timestamp from CSV

I'm trying to list my tweets using Python from my own archive. the only thing I have trouble with is, how to convert the timestamp from a string into a datetime object. Here is an excerpt of my CSV:

"tweet_id","in_reply_to_status_id","in_reply_to_user_id","timestamp","source","text","retweeted_status_id","retweeted_status_user_id","retweeted_status_timestamp","expanded_urls"
"x","y","z","2016-11-27 22:14:47 +0000","<a href=""https://about.twitter.com/products/tweetdeck"" rel=""nofollow"">TweetDeck</a>","@a @b Also feel free to do so [2/2]","","","",""


Here's my code:

#!/usr/bin/env python
# encoding: utf-8


import csv
from datetime import datetime

with open('tweets.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=',')
for row in readCSV:

# This works like a charm
date_str = "2016-11-28 07:12:01 +0000"
dt_obj = datetime.strptime(date_str, "%Y-%m-%d %H:%M:%S +0000")


# This doesn't
#date = datetime.strptime(row[3], "%Y-%m-%d %H:%M:%S +0000")

# Get message
msg = row[5]


print("Datestring from CSV: " + row[3])
print("Datestring from static variable: " + datetime.strftime(dt_obj, "%d.%m.%Y %H:%M:%S"))
print(msg)


When I run this program, I would get the following output:

Datestring from CSV: 2016-11-27 22:14:47 +0000

Datestring from static variable: 28.11.2016 07:12:01

@a @b Also feel free to do so [2/2]


But when I uncomment the not working section I get an error:

ValueError: time data 'timestamp' does not match format '%Y-%m-%d %H:%M:%S +0000'


Why is that so? I cannot figure out why. The date format seems correct, there are no small L in the timestamp from the csv, its type is a string and it should work. I'm not seeing what I'm missing here.

Thanks!

Update



@ArunDhaJ pointed out to better use the dateutil.parser.parse() function in this answer. If I call it from the interpreter it works just fine:
$$
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> from dateutil.parser import *
>>> parse("2012-06-22 08:12:30 +0000")
datetime.datetime(2012, 6, 22, 8, 12, 30, tzinfo=tzutc())


Running from the script generates a value error. Is this an encoding problem?

Traceback (most recent call last):
File "./test.py", line 11, in <module>
date = parse(row[3])
File "/usr/lib/python2.7/dist-packages/dateutil/parser.py", line 1008, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/usr/lib/python2.7/dist-packages/dateutil/parser.py", line 395, in parse
raise ValueError("Unknown string format")
ValueError: Unknown string format

Answer

Try skipping the header of your csv after readCSV = csv.reader(csvfile, delimiter=',') , add:

readCSV.next()

or

next(readCSV)