bearaman bearaman - 2 months ago 15
C# Question

Regular Expression to clear attributes from a html tag

I have a pretty simple reg ex question. My HTML tag looks like the following:

<body lang=EN-US link=blue vlink=purple>

I want to clear all attributes and just return

There are a number of other HTML tags whose attributes I'd like to clear so I hope to reuse the solution. How to do this with a regular expression?


Use HtmlAgilityPack like this:

    public string RemoveAllAttributesFromEveryNode(string html)
        var htmlDocument = new HtmlAgilityPack.HtmlDocument();
        foreach (var eachNode in htmlDocument.DocumentNode.SelectNodes("//*"))
        html = htmlDocument.DocumentNode.OuterHtml;
        return html;

Call this method passing the html that you want to remove all attributes from.

will help you a lot with this.

Don't use a regex for html files that may contain scripts, as in Javascript, the characters < and > are not tag delimiters but operators. A Regexp will probably match these operators as if they were tags, which will completely mess up the document.