I have user submitted tags that can be any type of (valid) UTF-8 string. I want to know if it is safe to include them in the URL merly by running them through
urlencode does not depend on a specific character encoding. It just looks at the bytes, interprets them as ASCII characters and replaces any byte that is either not allowed in ASCII (0x80–0xFF) or not allowed in plain in a URL.
Now to your question: Yes, using
urlencode does encode any string in any character encoding to be safely used – but only in the URL query! Because
urlencode formats the input according to application/x-www-form-urlencoded that differs from the “normal” percent encoding in how the space is encoded: In application/x-www-form-urlencoded spaces are replaced by
+ while the “normal” percent encoding replaces them by
If you want to “normal” percent encoding use