data_garden data_garden - 16 days ago 10
Python Question

UnicodeEncodeError: 'ascii' codec can't encode

Ia have the following data container which is constantly being updated:

data = []
for val, track_id in zip(values,list(track_ids)):
#below
if val < threshold:
#structure data as dictionary
pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
data.append(pre_data)
#write to file
with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)


but I am getting a lot of errors like this:

json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump
fp.write(chunk)

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)


Is there a way I can get rid of this encoding problem once and for all?

I was told that this would do it:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')


but many people do not recommend it.

I use
python 2.7.10


any clues?

Answer

When you write to a file that was opened in text mode, Python encodes the string for you. The default encoding is ascii, which generates the error you see; there are a lot of characters that can't be encoded to ASCII.

The solution is to open the file in a different encoding. In Python 2 you must use the codecs module, in Python 3 you can add the encoding= parameter directly to open. utf-8 is a popular choice since it can handle all of the Unicode characters, and for JSON specifically it's the standard; see https://en.wikipedia.org/wiki/JSON#Data_portability_issues.

import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f: