I have a large HTML data string separated into small chunks. I am trying to write a PowerShell script to remove all the HTML tags, but am finding it difficult to find the right regex pattern.
<p>This is an example</br>of various <span style="color: #445444">html content</span>
$string -replace '\<([^\)]+)\>',''
For a pure regex, it should be as easy as
$string -replace '<[^>]+>',''
Note that this could fail with certain HTML comments or the contents of
Instead, you could use the HTML Agility Pack, which is designed for use in .Net code, and I've used it successfully in PowerShell before:
Add-Type -Path 'C:\packages\HtmlAgilityPack.1.4.6\lib\Net40-client\HtmlAgilityPack.dll' $doc = New-Object HtmlAgilityPack.HtmlDocument $doc.LoadHtml($string) $doc.DocumentNode.InnerText
HTML Agility Pack works well with non-perfect HTML.