I heard good things about the HTMLAgilityPack library, so I thought I'd give it a try but I have had absolutely zero success with it. I've been trying to figure this out for months. No matter what I do, I cannot get this code to give me anything other than null. I tried following this example (http://www.c-sharpcorner.com/uploadfile/9b86d4/getting-started-with-html-agility-pack/), but I do not get the same results and I cannot explain why.
I try loading the file and then run SelectNodes to select all hyperlinks, but it always returns an empty list. I've tried selecting all kinds of nodes (divs, p, a, everything and anything) and it always returns an empty list. I've tried using doc.Descendants, I've tried using different source files, locally and on the the web and nothing I do will ever return an actual result.
I must have overlooked something important, but I cannot figure out what it is. What could I be missing?
public string GetSource()
string result = "";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
throw new Exception("Unable to load doc");
doc.LoadHtml("htmldoc.html"); // copied locally to bin folder, confirmed it found the file and loaded it
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a"); // Always returns null, regardless of what I put in here
if (nodes != null)
foreach (HtmlNode item in nodes)
result += item.InnerText;
// Every. Single. Time.
throw new Exception("No matching nodes found in document");
catch (Exception ex)
<title>Testing HTML Agility Pack</title>
<a href="div1-a1">Link 1 inside div1</a>
<a href="div1-a2">Link 2 inside div1</a>
<a href="a3">Link 3 outside all divs</a>
<a href="div2-a1">Link 1 inside div2</a>
<a href="div2-a2">Link 2 inside div2</a>
To load a file you should use
LoadHtml is used for strings containing html