cqcn1991 cqcn1991 - 2 months ago 22
Python Question

Jupyter Notebook: How to output chinese?

The coloumn ['douban_info'] in my dataset is info about movies in Chinese which stored in JSON, so when I do

df['douban_info'][0]
, it returns

enter image description here

The chinese are all changed into things like
\u7834\u6653\u8005
, which I can't read with ease. Is it possible to make python to turn them into the originall chinese when outputing?

I'm on Python 2.7.

Answer

This is how Python 2 works. It by default displays the repr() when generating display strings for lists and strings. You have to print strings to see the Unicode characters:

>>> D = {u'aka': [u'2019\u730e\u8840\u90fd\u5e02(\u6e2f)', u'\u9ece\u660e\u65f6\u5206']}
>>> D[u'aka'][0]
u'2019\u730e\u8840\u90fd\u5e02(\u6e2f)'
>>> print D[u'aka'][0]
2019猎血都市(港)

If you can't move to Python 3, you'll have to make your own display routine if you don't like the default repr() display. Something like:

D = {u'aka':[u'2019\u730e\u8840\u90fd\u5e02(\u6e2f)',u'\u9ece\u660e\u65f6\u5206']}

def dump(item):
    L = []
    if isinstance(item,dict):
        for k,v in item.items():
            L.append(dump(k) + ':')
            L.append(dump(v))
        return '{' + ', '.join(L) + '}'
    elif isinstance(item,list):
        for i in item:
            L.append(dump(i))
        return '[' + ', '.join(L) + ']'
    else:
        return "u'" + item + "'"

print dump(D)

Output:

{u'aka':, [u'2019猎血都市(港)', u'黎明时分']}

Note this is by no means complete as a generic dumping utility.

In Python 3 repr() has been updated:

>>> print(D)
{'aka': ['2019猎血都市(港)', '黎明时分']}