Luda Otaku Luda Otaku - 15 days ago 5
Bash Question

python script raises an error depending on how it was called from the shell

I'm making this small tool to convert a skype's chat database into a somewhat more readable representation under the form of an IRC-like chat export. I'm making this because I saved some of my old skypes chats' .db files, and now I'm trying to extract the content from them. That part I got working, now however there's something I just cannot figure out why it happens.

If I invoke my script as

./skype2text.py file.db chat_partner_id
it works fine, and prints the chat with the specified user id to stdout

Having that working I wanted to save the output to a file instead of printing it to stdout, so I just ran it as
echo $(./skype2text.py file.db chat_partner_id)
first just to see how it went, so I could redirect it to a file, and that's when the weird stuff happens. It prints the first chat line and crashes. (And also completely ignores newlines after that).

$ echo $(./skype2text.py "main 1.db" miya)
Traceback (most recent call last):
File "./skype2text.py", line 62, in <module>
print(u"<" + row[0] + u"> " + unicode(parser.unescape(unicode(row[1]))))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 47: ordinal not in range(128)
<Luda> C'est moi <ss type="wink">;)</ss> <MiYa> None


Here is the code

#!/usr/bin/env python2
# charset=utf-8

from __future__ import print_function

import sys
import sqlite3
import os.path
import HTMLParser


def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)

def eprint_use():
eprint("usage : " + sys.argv[0] + " <file.db> <partner's skype ID> [output file]")

# actual code here
# first of all check argv and print the help message if it's wrong

if len(sys.argv) < 3 or len(sys.argv) > 4:
eprint_use()

else:
database_path = sys.argv[1]
partner_id = sys.argv[2]
output_path = sys.argv[3] if len(sys.argv) == 4 else partner_id + '.txt'

if not os.path.isfile(database_path):
sys.exit('the file %s does not exist' % (database_path))

connection = sqlite3.connect(database_path)
cursor = connection.cursor()

parser = HTMLParser.HTMLParser()

cursor.execute("SELECT from_dispname,body_xml FROM Messages WHERE dialog_partner='" + partner_id + "' ORDER BY timestamp")

for row in cursor.fetchall():
print(u"<" + row[0] + u"> " + unicode(parser.unescape(unicode(row[1]))))


I've ommitted most of the comments at the top that serve no purpose, so line 62 here refers to the very last line.

I may be doing something wrong with the SQL queries at some point. I don't really check if the input is valid either, but that's not really the point. Why does this happen ? Why does calling the script differently causes it to crash although it works perfectly on its own ? I've checked
sys.argv
as well and it contains the same thing in both cases. Also yes, I have an
output_path
variable which is not used, I'll adapt the output depending on the third parameter, if it contains a filename I'll output to the file instead, for now. The weirdest is why does it cause an unicode exception ?

$ bash --version
GNU bash, version 4.3.48(1)-release (x86_64-pc-linux-gnu)
$ python2 --version
Python 2.7.10

Answer

probably the default encoding is utf-8 in first case (when it works), and it is ascii in second case (when UnicodeEncodeError happens)

maybe try:

for row in cursor.fetchall():
    res = u"<" + row[0] + u"> " +  unicode(parser.unescape(unicode(row[1])))
    print(res.encode('utf-8'))
Comments