require a valid UTF-8 string. I have string that may be in a different encoding. I need to ignore or substitute all invalid characters to be able to convert to JSON.
- It should be something very simple and robust.
- The error is in module for manual checking, so mojibake is fine.
- The code responsible for fixing encoding is in different module. (It was broken, thought.) I don’t want to duplicate responsibility.
The hex of example of invalid string:
My current solution:
$raw_str = hex2bin('496e76616c6964206d61726b2096');
$sane_str = @\iconv('UTF-8', 'UTF-8//IGNORE', $raw_str);
The three problems with my code:
looks little too heavy.
- Many programmers don't like
may ignore too much: the whole string.
Any better idea?
There is similar question, but I don't care about conversion.
Ensuring valid utf-8 in PHP