I have a socket server that is supposed to receive UTF-8 valid characters from clients.
The problem is some clients (mainly hackers) are sending all the wrong kind of data over it.
I can easily distinguish the genuine client, but I am logging to files all the data sent so I can analyze it later.
Sometimes I get characters like this
str = unicode(str, errors='replace')
str = unicode(str, errors='ignore')
Note: This solution will strip out (ignore) the characters in question returning the string without them. Only use this if your need is to strip them not convert them.
For Python 3:
While reading the file:
with codecs.open(file_name, "r",encoding='utf-8', errors='ignore') as fdata: