iShow iShow - 6 months ago 130
C# Question

How to get img/src or a/hrefs using Html Agility Pack?

I want to use the HTML agility pack to parse image and href links from a HTML page,but I just don't know much about XML or XPath.Though having looking up help documents in many web sites,I just can't solve the problem.In addition,I use C# in VisualStudio 2005.And I just can't speak English fluently,so,I will give my sincere thanks to the one can write some helpful codes.


The first example on the home page does something very similar, but consider:

 HtmlDocument doc = new HtmlDocument();
 doc.Load("file.htm"); // would need doc.LoadHtml(htmlSource) if it is not a file
 foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a[@href"])
    string href = link["href"].Value;
    // store href somewhere

So you can imagine that for img@src, just replace each a with img, and href with src. You might even be able to simplify to:

 foreach(HtmlNode node in doc.DocumentElement
              .SelectNodes("//a/@href | //img/@src")

For relative url handling, look at the Uri class.