I'm reading about XSS to educate myself on security while working with PHP. I'm referring to this article, in which they talk about XSS and some of the rules that should be adhered to.
Could someone explain Rules #0 and #1 for me? I understand some of what they are saving, but when they say untrusted data do they mean data entered by the user?
I'm working on some forms and I'm trying to adhere to these rules to prevent XSS. The thing is, I never output anything to the user once the form is complete. All I do is process data and save it to text files. I've done some client-side and a lot of server-side validation, but I can't figure out what they mean by
never insert untrusted data except in allowed locations
closing tags - </>?
Rule #0 means that you should not output data in locations of your webpage, where it's expected to run instructions.
As shown on your url, do not put user generated data inside
<script>tags. For example, this is a no-no:
<script> var usernameSpanTag = document.getElementById('username'); usernameSpanTag.innerText = "Welcome back, "+<?=$username?>+"!"; </script>
Looks pretty safe, right? Well, what if your $username variable contains the following values:
So, on a website what you're going to display is going to be this:
<script> var usernameSpanTag = document.getElementById('username'); usernameSpanTag.innerText = "Welcome back, "+""; console.log(document.cookie);//+"!"; </script>
So someone can easily steal your user's cookies and elevate their privileges. Now imagine that you're using similar code to say, update which user created the latest post, and shows up via AJAX. That's a disaster waiting to happen if you do something like above (and do not sanitize the username in the first place).
Same applies for
<iframe> or any other tag that lets you run scripts or import resources. Also applies to comments. Browsers ignore comments, but some interpreters like the JSP parser handles HTML comments as template text. It doesn't ignore its contents.
Rule #1 is pretty similar tu rule #0, if you're developing web applications at some point or another you will have to output user generated data, whether it is an email address, a username, a name, or whatever.
If you're developing a forum, probably you may want to give your users some styling options for their text. Basic stuff like bold letters, underlined and italics should suffice. If you want to get fancy, you may even let your users change the font.
An easy way to do it, without too many complications, is just letting users write their own HTML if they choose to do so, so if you output HTML from your users in "safe" locations like between
<p> tags, then that's a disaster waiting to happen as well.
Because I can write:
Hey everybody, this is my first post
If you don't escape that input, people will only see:
Hey everybody, this is my first post`!
So always escape the data. If you're using PHP you can use
htmlentities or use a template engine like Twig, that automatically escapes the output for you.