George H. George H. - 1 month ago 13
C# Question

HtmlAgilitypack enumerate all classes

I have been dealing with html a lot in general and always used

Regex
to get my results. Every time I look for help though, everyone recommends to use HTML parsers, such as HTMLAgilitypack.

I just tried it and man, it is too much for me at the moment.
This is how I tried to enumerate the spans of the html code

private static string _InetReadEx(string sUrl)
{
try
{
HtmlWeb website = new HtmlWeb();
HtmlDocument htmlDoc = website.Load(sUrl);

var allElementsWithClassFloat = htmlDoc.DocumentNode.SelectNodes("//div[contains(@class,'pid')]");
for (int i = 0; i < allElementsWithClassFloat.Count; i++)
{
Console.WriteLine(allElementsWithClassFloat[i].InnerText);
}

return aRet;
}
catch (Exception ex)
{
throw ex;
}
}


and I am getting the error
Expression must evaluate to a node-set


I have uploaded the HTML file here because it was too big to add it on the post
I need to enumerate all the classes that contain "pid".

Answer

I think you need something like

private static List<string> _InetReadEx(string sUrl)    // Returns string list
{
    var aRet = new List<string>();                      // string list var
    try
    {
        var website = new HtmlAgilityPack.HtmlWeb();    // Init the object
        var htmlDoc = website.Load(sUrl);               // Load doc from URL

        var allElementsWithClassFloat = htmlDoc.DocumentNode.SelectNodes("//*[contains(@class,'pid')]"); // Get all nodes with class value containing pid
        if (allElementsWithClassFloat != null)          // If nodes found
        {
            for (int i = 0; i < allElementsWithClassFloat.Count; i++)
            {
                if (!string.IsNullOrWhiteSpace(allElementsWithClassFloat[i].InnerText) && // if not blank/null
                    !aRet.Contains(allElementsWithClassFloat[i].InnerText)) // if not already present
                {
                    aRet.Add(allElementsWithClassFloat[i].InnerText);  // Add to result
                    Console.WriteLine(allElementsWithClassFloat[i].InnerText); // Demo line
                }
            }
        }
        return aRet;
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

The XPath is //*[contains(@class,'pid')]:

  • //* - get all element nodes that...
  • [contains( - contain...
  • @class,'pid' - pid substring inside the class attribute value
  • )] - end of the contains condition