As you all know emoji symbols are coded up to 3 or 4 bytes, so it may occupy 2 symbols in my string. For example '
\u.... notation has four hex digits, no less, no more, so it can only represent code points up to U+FFFF. Unicode characters above that are represented as pairs of surrogate code points.
For example, you could look for code points in the range
[\uD800-\uDBFF] (high surrogates), and when you find one, check that the next code point in the string is in the range
[\uDC00-\uDFFF] (if not, there is a serious data error), interpret the two as a Unicode character, and replace them by whatever you wish to put there. This looks like a job for a simple loop through the string, rather than a regular expression.