Martin_v Martin_v - 4 months ago 12
Python Question

Encoding csv files on opening with Python

So i have this csv which has rows like these:

"41975","IT","Catania","2016-01-12T10:57:50+01:00",409.58
"538352","DE","Düsseldorf","2015-12-18T20:50:21+01:00",95.03
"V22211","GB","Nottingham","2015-12-31T11:17:59+00:00",872


In the current example the first and the third word are working fine but the program crashes when it prints
Düsseldorf
, the
ü
is problematic

I want to be able to get the information from this csv file and to be able to
print
it. Here is my code:

def load_sales(file_name):
SALES_ID = 0
SALES_COUNTRY = 1
SALES_CITY = 2
SALES_DATE = 3
SALES_PRICE =4
with open(file_name, 'r', newline='', encoding='utf8') as r:
reader = csv.reader(r)
result=[]
for row in reader:
sale={}
sale["id"]=row[SALES_ID]
sale["country"]=row[SALES_COUNTRY]
sale["city"]=row[SALES_CITY]
sale["date"]=row[SALES_DATE]
sale["price"]=float(row[SALES_PRICE])
result.append(sale)


when I print I print the
result
I get:

File "C:\Anaconda3\lib\encodings\cp866.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xfc' in position 384: character maps to <undefined>


So far I have tried: changing the
encoding
value in the open function with
utf-8
,
UTF8
etc., making a print function:

def write_uft8(data):
print(data).encode('utf-8')


But this is not a viable way when you have to print list of dictionaries.

Someone told me that the problem is that my python is not set to encode to these messages to utf-8, is that true and how do I change it ?

Answer

The issue here is that when python writes to a stream, it attempts to write text in a fashion that is compatible with the encoding or character set of that stream.

In this case, it appears you are running the command in a Windows console that is set to display Cyrillic text (CP866). The Cyrillic codepage does not contain a corresponding character for ü and thus the string cannot be decoded to an appropriate character for output.

Changing the active codepage of your windows cmd console to utf-8 should help:

$ CHCP 65001
Comments