Dimax Dimax - 6 months ago 19
HTML Question

Remove Style tag in HTML

I need to remove all style tags completely for the given HTML code. I found following regex to match entire style tag in the the XML. It works fine for the given Html code in online regex testers.

*style\s*=\s*('|")[^\2]*?\2([^>]*)*


However, through a C# code, it didn't work for the given HTML.

Following is the C# code:

Regex regex = new Regex("style\\s*=\\s*('|\")[^\\2]*?\\2([^>]*)", RegexOptions.IgnoreCase);

Answer

Regex should be

 style\s*=\s*('|")[^\1]*\1

Though I would use Htmlagilitypack

   HtmlDocument doc = new HtmlDocument();
   doc.Load(yourStream);
   var elementsWithStyleAttribute = doc.DocumentNode.SelectNodes("//@style");
   foreach (var element in elementsWithStyleAttribute)
   {
       element.Attributes["style"].Remove();
   }
   doc.Save();
Comments