Transmutive Daisies Transmutive Daisies - 5 months ago 23
HTML Question

HTMLAgilityPack Selectnodes always returns null

I heard good things about the HTMLAgilityPack library, so I thought I'd give it a try but I have had absolutely zero success with it. I've been trying to figure this out for months. No matter what I do, I cannot get this code to give me anything other than null. I tried following this example (, but I do not get the same results and I cannot explain why.

I try loading the file and then run SelectNodes to select all hyperlinks, but it always returns an empty list. I've tried selecting all kinds of nodes (divs, p, a, everything and anything) and it always returns an empty list. I've tried using doc.Descendants, I've tried using different source files, locally and on the the web and nothing I do will ever return an actual result.

I must have overlooked something important, but I cannot figure out what it is. What could I be missing?


public string GetSource()
string result = "";

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
if (!System.IO.File.Exists("htmldoc.html"))
throw new Exception("Unable to load doc");

doc.LoadHtml("htmldoc.html"); // copied locally to bin folder, confirmed it found the file and loaded it

HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//a"); // Always returns null, regardless of what I put in here

if (nodes != null)
foreach (HtmlNode item in nodes)
result += item.InnerText;
// Every. Single. Time.
throw new Exception("No matching nodes found in document");

return result;
catch (Exception ex)
return ex.ToString();

The source HTML file 'htmldoc.html' I'm using looks like this:

<title>Testing HTML Agility Pack</title>
<div id="div1">
<a href="div1-a1">Link 1 inside div1</a>
<a href="div1-a2">Link 2 inside div1</a>
<a href="a3">Link 3 outside all divs</a>
<div id="div2">
<a href="div2-a1">Link 1 inside div2</a>
<a href="div2-a2">Link 2 inside div2</a>


To load a file you should use Load method.. LoadHtml is used for strings containing html