Franz Franz - 3 months ago 50
MySQL Question

How to replace/remove 4(+)-byte characters from a UTF-8 string in PHP?

It seems like MySQL does not support characters with more than 3 bytes in its default UTF-8 charset.

So, in PHP, how can I get rid of all 4(-and-more)-byte characters in a string and replace them with something like by some other character?

Answer

NOTE: you should not just strip, but replace with replacement character U+FFFD to avoid unicode attacks, mostly XSS:

http://unicode.org/reports/tr36/#Deletion_of_Noncharacters

preg_replace('/[\x{10000}-\x{10FFFF}]/u', "\xEF\xBF\xBD", $value);
Comments