build_code build_code - 3 months ago 12
Python Question

Unicode-encode issues while sending desktop notification using Python

I am fetching latest football scores from a website and sending a notification on the desktop (OS X). I am using BeautifulSoup to scrape the data. I had issues with the unicode data which was generating this error

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128).


So I inserted this at the beginning which solved the problem while outputting on the terminal.

import sys
reload(sys)
sys.setdefaultencoding('utf-8')


But the problem exists when I am sending notifications on the desktop. I use terminal-notifier to send desktop-notifications.

def notify (title, subtitle, message):
t = '-title {!r}'.format(title)
s = '-subtitle {!r}'.format(subtitle)
m = '-message {!r}'.format(message)
os.system('terminal-notifier {}'.format(' '.join((m, t, s))))


The below images depict the output on the terminal Vs the desktop notification.

Output on terminal.

enter image description here

Desktop Notification

Dektop Notification

Also, if I try to replace the comma in the string, I get the error,

new_scorer = str(new_scorer[0].text).replace(",","")


File "live_football_bbc01.py", line 41, in get_score
new_scorer = str(new_scorer[0].text).replace(",","")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)


How do I get the output on the desktop notifications like the one on the terminal? Thanks!

Edit : Snapshot of the desktop notification. (Solved)

enter image description here

Answer

You are formatting using !r which gives you the repr output, forget the terrible reload logic and either use unicode everywhere:

def notify (title, subtitle, message):
    t = u'-title {}'.format(title)
    s = u'-subtitle {}'.format(subtitle)
    m = u'-message {}'.format(message)
    os.system(u'terminal-notifier {}'.format(u' '.join((m, t, s))))

or encode:

def notify (title, subtitle, message):
    t = '-title {}'.format(title.encode("utf-8"))
    s = '-subtitle {}'.format(subtitle.encode("utf-8"))
    m = '-message {}'.format(message.encode("utf-8"))
    os.system('terminal-notifier {}'.format(' '.join((m, t, s))))

When you call str(new_scorer[0].text).replace(",","") you are trying to encode to ascii, you need to specify the encoding to use:

In [13]: s1=s2=s3= u'\xfc'

In [14]: str(s1) # tries to encode to ascii
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-14-589849bdf059> in <module>()
----> 1 str(s1)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)

In [15]: "{}".format(s1) + "{}".format(s2) + "{}".format(s3) # tries to encode to ascii---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-15-7ca3746f9fba> in <module>()
----> 1 "{}".format(s1) + "{}".format(s2) + "{}".format(s3)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 0: ordinal not in range(128)

You can encode straight away:

In [16]: "{}".format(s1.encode("utf-8")) + "{}".format(s2.encode("utf-8")) + "{}".format(s3.encode("utf-8"))
Out[16]: '\xc3\xbc\xc3\xbc\xc3\xbc'

Or use use all unicode prepending a u to the format strings and encoding last:

In [17]: out = u"{}".format(s1) + u"{}".format(s2) + u"{}".format(s3)
In [18]: out
Out[18]: u'\xfc\xfc\xfc'

In [19]: out.encode("utf-8")
Out[19]: '\xc3\xbc\xc3\xbc\xc3\xbc'

If you use !r you are always going to the the bytes in the output:

In [30]: print "{}".format(s1.encode("utf-8"))
ΓΌ

In [31]: print "{!r}".format(s1).encode("utf-8")
u'\xfc'

You can also pass the args using subprocess:

from subprocess import check_call


def notify (title, subtitle, message):
    cheek_call(['terminal-notifier','-title',title.encode("utf-8"),
                '-subtitle',subtitle.encode("utf-8"),
                '-message'.message.encode("utf-8")])