Aigori Aigori - 6 months ago 20
iOS Question

UTF-8 encoding prefix notiation "percent(%)" vs "backslash(\)"

I just wonder about prefix of UTF-8 code notation. In some cases, it is represented as

using backslash symbol (\). However, I can also find codes using percent symbol (%). e.g.

There is no problem with modern browsers whatever I use, but when I use percent notation,
cannot recognize codes returning object containing null URL.

What is right notation of UTF-8 code, and how can I solve the problem when I use
with percent prefixed url string?

EDIT: I was wrong to word
. It is correct to refer to
Unicode character set
rather than
UTF-8 encoding


This has nothing to do with UTF-8. The \u notation is used by various languages (C, Java) to encode Unicode characters into strings. When the string "\uc774\ud2b8" is encountered, it'll generally be encoded in UTF-8, which means it's the byte sequence 0xEC 0x9D 0xB4 0xED 0x8A 0xB8. But it could be encoded other ways, such as 0xC774 0xD2B8 (UTF-16). It depends on your system. But the point is that this string is 2 unicode characters long. The language does not care about %. URL parsers do.

The % encoding is completely different. It's the percent-encoding standard for URI reserved characters defined by RFC 3986. When your compiler encounters "%uc774%ud2b8" it will encode it as "%" "u" "7" "7" "4", etc. (Typically it will encode each of these in UTF-8, but it depends on the system). Most languages do not treat % as special. \u (and \ generally) is part of the language. % is not. So this string is 12 unicode characters long.

In order for iOS to convert a string to an NSURL, the string must be correctly encoded. That may include percent-encoding in some parts of the URL and may forbid percent-encoding in other parts of the URL (and which characters may or must be percent-encoded can be different in different parts of the URL). The rules are spelled out in RFC 3986.