Xeoncross Xeoncross - 11 months ago 79
PHP Question

PHP: is urlencode() a safe way to allow valid UTF-8 strings in the URL?

I have user submitted tags that can be any type of (valid) UTF-8 string. I want to know if it is safe to include them in the URL merly by running them through


In other words, is urlencode() safe to use for valid UTF-8 strings?
(by valid I mean id have already force-encoded them to UTF-8)

Answer Source

urlencode does not depend on a specific character encoding. It just looks at the bytes, interprets them as ASCII characters and replaces any byte that is either not allowed in ASCII (0x80–0xFF) or not allowed in plain in a URL.

Now to your question: Yes, using urlencode does encode any string in any character encoding to be safely used – but only in the URL query! Because urlencode formats the input according to application/x-www-form-urlencoded that differs from the “normal” percent encoding in how the space is encoded: In application/x-www-form-urlencoded spaces are replaced by + while the “normal” percent encoding replaces them by %20.

If you want to “normal” percent encoding use rawurlencode instead.