user7001260 user7001260 - 1 year ago 90
Python Question

String encode/decode issue - missing character from end

I am having

type column in my database. I am unable to convert the content of this column to plain string in my code. (I am using
for the database connection).

# This unicode string is returned by the database
>>> my_string = u'\u4157\u4347\u6e65\u6574\u2d72\u3430\u3931\u3530\u3731\u3539\u3533\u3631\u3630\u3530\u3330\u322d\u3130\u3036\u3036\u3135\u3432\u3538\u2d37\u3134\u3039\u352d'

# prints something in chineese
>>> print my_string

The closest I have gone is via encoding it to

>>> my_string.encode('utf-16')
>>> print my_string.encode('utf-16')

But the actual value that I need as per the value store in database is:


I tried with encoding it to
but nothing seemed to work.

Does anyone have the idea regarding what I am missing? And how to get the desired result from the

Edit: On converting it to
, I am able to remove unwanted characters from start, but still one character is missing from end

>>> print t.encode('utf-16-le')

On trying for some other columns, it is working. What might be the cause of this intermittent issue?

Answer Source

You have a major problem in your database definition, in the way you store values in it, or in the way you read values from it. I can only explain what you are seeing, but neither why nor how to fix it without:

  • the type of the database
  • the way you input values in it
  • the way you extract values to obtain your pseudo unicode string
  • the actual content if you use direct (native) database access

What you get is an ASCII string, where the 8 bits characters are grouped by pair to build 16 bit unicode characters in little endian order. As the expected string has an odd numbers of characters, the last character was (irremediably) lost in translation, because the original string ends with u'\352d' where 0x2d is ASCII code for '-' and 0x35 for '5'. Demo:

def cvt(ustring):
    l = []
    for uc in ustring:
        l.append(chr(ord(uc) & 0xFF)) # low order byte
        l.append(chr((ord(uc) >> 8) & 0xFF)) # high order byte
    return ''.join(l)

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download