I'm just learning how to use HTML Agility Pack to scrape text off of webpages. I am looking to get the biographies of heros in Overwatch by Blizzard from their site. I'm currently using this to find and write the desired text to a rich text box.
var paragraphs = page.DocumentNode.SelectNodes("//div[@class='hero-bio-backstory pad-sm']");
foreach(HtmlNode node in paragraphs)
<div class="hero-bio-backstory pad-sm"> == $0
//div[@class='hero-bio-backstory pad-sm'] is returning one node - the entire
div. When you then call
InnerText on this node, it is returning the text in the entire div, sans markup. Therefore you are seeing the behavior you describe: your loop runs once, appends all the text in one chunk, then adds a single trailing newline.
You need to use an XPath expression which will select all the
p nodes, i.e.