PersianGulf PersianGulf - 4 months ago 14
Python Question

how to split a unicode string into list

I have the following code:

stru = "۰۱۲۳۴۵۶۷۸۹"
strlist = stru.decode("utf-8").split()
print strlist[0]


my output is :

۰۱۲۳۴۵۶۷۸۹


But when i use:

print strlist[1]


I get the following
traceback
:

IndexError: list index out of range


My question is, how can I
split
my
string
? Of course, remember I get my
string
from a
function
, consider it's a
variable
?

Answer

The split() method by default splits on whitespace. Therefore, strlist is a list that contains the whole string in strlist[0], and one single element.

If you want a list with one element for each unicode codepoint you can do transform it into a list in different ways:

  • Function: list(stru.decode("utf-8"))
  • List comprension: [item for item in stru.decode("utf-8")]
  • Not convert at all. Do you really need a list? You can iterate over the unicode string just like over any other sequence type (for character in stru.decode("utf-8"): ...)
Comments