Santiago Trejo Santiago Trejo - 5 months ago 24
HTML Question

Remove HTML nodes from HTTP Request

I have some HTML code stored into a string variable, resulting from a

HttpWebRequest
:

<html>
<head>
<div>Lots of scripts and libraries</div>
</head>
<body>
<div>Some very useful data</div>
</body>
<footer>
<div>Not interesting struff</div>
</footer>
<html>


How can I do to remove all unecesary nodes and get into this:

<body>
<div>Some very useful data</div>
</body>

Answer

The easiest way is to use HtmlAgilityPack to grab just the body tag.

var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);

HtmlNode body = document.DocumentNode.SelectSingleNode("//body");

From there, you can use HtmlAgilityPack to further parse the body node for more detail.