umLu - 1 year ago 325

Python Question

I'm trying to export a numpy array that contains unicode elements to a text file.

So far I got the following to work, but doesn't have any unicode character:

`import numpy as np`

array_unicode=np.array([u'maca' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:

np.savetxt(f,array_unicode,fmt='%s')

If I change 'c' from 'maca' to 'ç' I get an error:

`import numpy as np`

array_unicode=np.array([u'maça' u'banana',u'morango'])

with open('array_unicode.txt','wb') as f:

np.savetxt(f,array_unicode,fmt='%s')

Traceback:

`Traceback (most recent call last):`

File "<ipython-input-48-24ff7992bd4c>", line 8, in <module>

np.savetxt(f,array_unicode,fmt='%s')

File "C:\Anaconda2\lib\site-packages\numpy\lib\npyio.py", line 1158, in savetxt

fh.write(asbytes(format % tuple(row) + newline))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 2: ordinal not in range(128)

How can I set

`savetxt`

Answer

There are many ways you can accomplish this, however, numpy arrays need to be setup in very specific ways (usually using a `dtype`

) to allow unicode characters in these circumstances.

```
#!/usr/bin/python
# -*- coding: utf-8 -*-
import numpy as np
dt = np.dtype(str, 10)
array_unicode=np.array(['maça','banana','morangou'], dtype=dt)
with open('array_unicode.txt','wb') as f:
np.savetxt(f, array_unicode, fmt='%s')
```

You'll need to be aware of the string length in your array as well as the length you decide to setup within the dtype. If it's too short you'll truncate your data, if it's too long it's wasteful. I suggest you read the **Numpy data type objects (dtype) documentation**, as there are many other ways you might consider setting up the array depending on the data format.

↳ http://docs.scipy.org/doc/numpy-1.9.3/reference/arrays.dtypes.html

Here's an alternative function that could do the conversion to unicode before saving:

```
#!/usr/bin/python
# -*- coding: utf-8 -*-
import numpy as np
array_unicode=np.array([u'maça',u'banana',u'morangou'])
def uniArray(array_unicode):
items = [x.encode('utf-8') for x in array_unicode]
array_unicode = np.array([items]) # remove the brackets for line breaks
return array_unicode
with open('array_unicode.txt','wb') as f:
np.savetxt(f, uniArray(array_unicode), fmt='%s')
```

Basically your `np.savetxt`

will call `uniArray`

for a quick conversion, then back. There might be better ways to than this, although it's been a while since I've used numpy; it's always seemed to be somewhat touchy with encodings.