I am trying to use the HTML Agility pack to scrape some data from a site. I am really struggling in figuring out how to use selectnodes inside a foreach and then exporting the data to a list or array.
Here is the code I am working with so far.
string result = string.Empty;
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(http://www.amazon.com/gp/offer-listing/B002UYSHMM/);
request.Method = "GET";
using (var stream = request.GetResponse().GetResponseStream())
using (var reader = new StreamReader(stream, Encoding.UTF8))
result = reader.ReadToEnd();
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
HtmlNode root = doc.DocumentNode;
string itemdesc = doc.DocumentNode.SelectSingleNode("//h1[@class='producttitle']").InnerText; //this works perfectly to get the title of the item
//HtmlNodeCollection sellers = doc.DocumentNode.SelectNodes("//id['bucketnew']/div/table/tbody/tr/td/ul/a/img/@alt");//this does not work at all in getting the alt attribute from the seller images
HtmlNodeCollection prices = doc.DocumentNode.SelectNodes("//span[@class='price']"); //this works fine getting the prices
HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='resultsset']/table/tbody[@class='result']/tr"); //this is the code I am working on to try to collect each tr in the result. I then want to eather add each span.price to a list from this and also add each alt attribute from the seller image to a list. Once I get this working I will want to use an if statement in the case that there is text for the seller name instead of an image.
List<string> sellers = new List<string>();
List<string> prices = new List<string>();
foreach (HtmlNode node in nodes)
HtmlNode seller = node.SelectSingleNode(".//img/@alt"); // I am not sure if this works
sellers.Add(seller.SelectSingleNode("img").Attributes["alt"]); //this definitly does not work and will not compile.
Your first problem with the commented out
SelectNodes doesn't work because 'id' is not an element name, it's an attribute name. You've used the correct syntax in your other expressions for selecting an attribute and comparing the value. Eg,
//ElementName[@attributeName='value']. I think even
[attributeName='value'] should work, but I have not tested this.
The syntax inside the
SelectNodes function is called "XPath". This link might help you out.
seller node you are selecting is a sibling of
node for the current iteration that is an img with an alt attribute. However I think the correct syntax you want is just
The next problem where you say it won't compile, check the error message, it will probably be complaining back argument types.
sellers.Add I think is looking to name another HtmlNode, not an attribute which is what the expression inside the add is returning.
Also, check out the Html Agility pack docs and other questions regarding syntax.