I just wonder about prefix of UTF-8 code notation. In some cases, it is represented as
Unicode character set
This has nothing to do with UTF-8. The
\u notation is used by various languages (C, Java) to encode Unicode characters into strings. When the string
"\uc774\ud2b8" is encountered, it'll generally be encoded in UTF-8, which means it's the byte sequence 0xEC 0x9D 0xB4 0xED 0x8A 0xB8. But it could be encoded other ways, such as 0xC774 0xD2B8 (UTF-16). It depends on your system. But the point is that this string is 2 unicode characters long. The language does not care about
%. URL parsers do.
% encoding is completely different. It's the percent-encoding standard for URI reserved characters defined by RFC 3986. When your compiler encounters
"%uc774%ud2b8" it will encode it as "%" "u" "7" "7" "4", etc. (Typically it will encode each of these in UTF-8, but it depends on the system). Most languages do not treat
% as special.
\ generally) is part of the language.
% is not. So this string is 12 unicode characters long.
In order for iOS to convert a string to an
NSURL, the string must be correctly encoded. That may include percent-encoding in some parts of the URL and may forbid percent-encoding in other parts of the URL (and which characters may or must be percent-encoded can be different in different parts of the URL). The rules are spelled out in RFC 3986.