How can I replace characters, such as emojis
utf8 encodes precisely the basic multilingual plane (BMP). Rather than specifically emoji, you need to exclude all code points from supplementary planes, since in MySQL these require
Since you appear to be matching against 16 bit rather than 32 bit wide strings, a code point outside the BMP is encoded as a so-called "high surrogate" in the range
0xD800..0xDBFF, followed by a "low surrogate" in the range
0xDC00..0xDFFF. The corresponding regex therefore is:
♥ will not match this since it is
u'\u2665'. I think strictly speaking it's only an emoji if followed by the variation selector
U+FE0F, but either way it's safely in the BMP.