Leary Leary - 10 months ago 119
C# Question

parsing HTML in C# ASP.net

here's my sample HTML...

<html>
<table class="test" border="0" >
<tr bgColor="#e8f4ff">
<td width="50%" align="right">
<b>Invoice ID:</b>
</td>
<td width="50%">
<b>
1622579
</b>
</td>
</tr>
<tr bgColor="#e8f4ff">
<td align="right">
<b>Code:</b>
</td>
<td>
<b>
20475
</b>
</td>
</tr>
</html>


there's no ID so ican't use SelectNodes()
How can i get the Code: 20475 using HTMLAgilitypack or regex?

Answer Source

Using latest HtmlAgilityPack, just using the document structure - this will not be very resilient to changes in the HTML - you should strongly consider adding appropriate ids (if this is your html anyway):

HtmlDocument doc = new HtmlDocument();
doc.Load(@"test.html");

var tds = doc.DocumentNode.Descendants("td").ToArray();
string codeValue = "";

for (int i = 1; i < tds.Length; i++)
{
    if (tds[i - 1].Element("b").InnerText == "Code:")
        codeValue = tds[i].Element("b").InnerText;
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download