Stuart Helwig Stuart Helwig - 2 months ago 22
ASP.NET (C#) Question

How do you convert Html to plain text?

I have snippets of Html stored in a table. Not entire pages, no tags or the like, just basic formatting.

I would like to be able to display that Html as text only, no formatting, on a given page (actually just the first 30 - 50 characters but that's the easy bit).

How do I place the "text" within that Html into a string as straight text?

So this piece of code.

<b>Hello World.</b><br/><p><i>Is there anyone out there?</i><p>


Becomes:

Hello World. Is there anyone out there?

Answer

If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like <script> tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:

<[^>]*>

If you do have to worry about <script> tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.

If you can use regular expressions there are many web pages out there with good info:

If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.