Michas Michas - 1 year ago 153
PHP Question

Sanitise UTF-8 in PHP

The function

require a valid UTF-8 string. I have string that may be in a different encoding. I need to ignore or substitute all invalid characters to be able to convert to JSON.

  1. It should be something very simple and robust.

  2. The error is in module for manual checking, so mojibake is fine.

  3. The code responsible for fixing encoding is in different module. (It was broken, thought.) I don’t want to duplicate responsibility.

The hex of example of invalid string:

My current solution:

$raw_str = hex2bin('496e76616c6964206d61726b2096');
$sane_str = @\iconv('UTF-8', 'UTF-8//IGNORE', $raw_str);

The three problems with my code:

  1. The
    looks little too heavy.

  2. Many programmers don't like

  3. The
    may ignore too much: the whole string.

Any better idea?

There is similar question, but I don't care about conversion.
Ensuring valid utf-8 in PHP

Answer Source

I think this is the best solution.

$raw_str = hex2bin('496e76616c6964206d61726b2096');
$sane_str = mb_convert_encoding($raw_str, 'UTF-8', 'UTF-8');
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download