Leary Leary - 3 years ago 234
C# Question

parsing HTML in C# ASP.net

here's my sample HTML...

<table class="test" border="0" >
<tr bgColor="#e8f4ff">
<td width="50%" align="right">
<b>Invoice ID:</b>
<td width="50%">
<tr bgColor="#e8f4ff">
<td align="right">

there's no ID so ican't use SelectNodes()
How can i get the Code: 20475 using HTMLAgilitypack or regex?

Answer Source

Using latest HtmlAgilityPack, just using the document structure - this will not be very resilient to changes in the HTML - you should strongly consider adding appropriate ids (if this is your html anyway):

HtmlDocument doc = new HtmlDocument();

var tds = doc.DocumentNode.Descendants("td").ToArray();
string codeValue = "";

for (int i = 1; i < tds.Length; i++)
    if (tds[i - 1].Element("b").InnerText == "Code:")
        codeValue = tds[i].Element("b").InnerText;
