RainingChain RainingChain - 5 months ago 27
Javascript Question

Convert UTF-8 string with only 8 bits per character

I have a Javascript string that contains characters that have a charCode greater than 255.

I want to be able to encode/decode that string into another string that has all its charCode less than or equal to 255. There is no restriction on the characters (ex: can be non-printable).

I want a solution that is as fast as possible and that produces a string as small as possible. It must also work for any UTF-8 character.

I found out that

encodeURI
does exactly that but it seems it takes a lot of space.

encodeURI('ĉ') === "%C4%89"
6 bytes...

Is there anything better than
encodeURI
?

Answer

What you want to do is encode your string as UTF8. Googling for how to do that in Javascript, I found http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html , which gives:

function encode_utf8( s ) {
  return unescape( encodeURIComponent( s ) );
}

function decode_utf8( s ) {
  return decodeURIComponent( escape( s ) );
}

or in short, almost exactly what you found already, plus unescaping the '%xx' codes to a byte.