umLu umLu - 7 months ago 151
Python Question

Write numpy unicode array to a text file

I'm trying to export a numpy array that contains unicode elements to a text file.

So far I got the following to work, but doesn't have any unicode character:

import numpy as np

array_unicode=np.array([u'maca' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
np.savetxt(f,array_unicode,fmt='%s')


If I change 'c' from 'maca' to 'ç' I get an error:

import numpy as np

array_unicode=np.array([u'maça' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:
np.savetxt(f,array_unicode,fmt='%s')


Traceback:

Traceback (most recent call last):
File "<ipython-input-48-24ff7992bd4c>", line 8, in <module>
np.savetxt(f,array_unicode,fmt='%s')
File "C:\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 1158, in savetxt
fh.write(asbytes(format % tuple(row) + newline))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)


How can I set
savetxt
from numpy to write unicode characters?

Answer

There are many ways you can accomplish this, however, numpy arrays need to be setup in very specific ways (usually using a dtype) to allow unicode characters in these circumstances.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np

dt = np.dtype(str, 10)
array_unicode=np.array(['maça','banana','morangou'], dtype=dt)

with open('array_unicode.txt','wb') as f:
    np.savetxt(f, array_unicode, fmt='%s')

You'll need to be aware of the string length in your array as well as the length you decide to setup within the dtype. If it's too short you'll truncate your data, if it's too long it's wasteful. I suggest you read the Numpy data type objects (dtype) documentation, as there are many other ways you might consider setting up the array depending on the data format.

http://docs.scipy.org/doc/numpy-1.9.3/reference/arrays.dtypes.html

Here's an alternative function that could do the conversion to unicode before saving:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy as np

array_unicode=np.array([u'maça',u'banana',u'morangou'])

def uniArray(array_unicode):
    items = [x.encode('utf-8') for x in array_unicode]
    array_unicode = np.array([items]) # remove the brackets for line breaks
    return array_unicode

with open('array_unicode.txt','wb') as f:
    np.savetxt(f, uniArray(array_unicode), fmt='%s')

Basically your np.savetxt will call uniArray for a quick conversion, then back. There might be better ways to than this, although it's been a while since I've used numpy; it's always seemed to be somewhat touchy with encodings.

Comments