Linux Question

UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

I have a socket server that is supposed to receive UTF-8 valid characters from clients.

The problem is some clients (mainly hackers) are sending all the wrong kind of data over it.

I can easily distinguish the genuine client, but I am logging to files all the data sent so I can analyze it later.

Sometimes I get characters like this

that cause the

I need to be able to make the string UTF-8 with or without those characters.

Answer Source

str = unicode(str, errors='replace')


str = unicode(str, errors='ignore')

Note: This solution will strip out (ignore) the characters in question returning the string without them. Only use this if your need is to strip them not convert them.

For Python 3:

While reading the file:

with, "r",encoding='utf-8', errors='ignore') as fdata:
