In addition to this, a fair amount of valid user content will include special characters and XML constructs, so I'd like to avoid a white-list approach if possible. (Listing every allowable HTML element and attribute).
"Hello, I have a
problem with the <dog>
"Hi, this <b
You think that's it? Check this out.
Whatever approach you take, you definitely need to use a whitelist. It's the only way to even come close to being safe about what you're allowing on your site.
I'm not familiar with .NET, unfortunately, but you can check out stackoverflow's own battle with XSS (http://blog.stackoverflow.com/2008/06/safe-html-and-xss/) and the code that was written to parse HTML posted on this site: Archive.org link - obviously you might need to change this because your whitelist is bigger, but that should get you started.