Unchained Unchained - 2 months ago 14
C# Question

Not able to get img tag content

I'm using

HtmlAgilityPack
and I'm trying to get the content inside this two images tag:

<div style="padding-left: 27px;">
<img src="http://s1.swimg.net/gsmf/578/img/events/appearance.png" width="13" height="13" alt="Presenze" title="Presenze"> 6
<img src="http://s1.swimg.net/gsmf/578/img/events/G.png" width="13" height="13" alt="Goal" title="Goal"> 0
</div>


how you can see each img tag is not closed, I'm trying to get
6
and
0
using this code:

Convert.ToInt32(div.SelectSingleNode(".//img[0]").InnerText.Trim())


the
div
variable contains the html above. The problem's that I get
null
on this code:
(div.SelectSingleNode(".//img[0]")
.

Maybe 'cause the tag is not closed, infact I see only one item inside the
div
variable that contains all the img tags.

How can I fix this?

Answer

You got null primarily because XPath index starts from 1, not 0. The target text, however, is considered sibling of img instead of content/inner text.

That said, you can use following-sibling::text() and limit the result to 1, to get the nearest text node following the img element. For example to get the text after the first img element, you can use the following XPath :

//img[1]/following-sibling::text()[1]

Complete demo :

var raw = @"<div style=""padding-left: 27px;"">
    <img src=""http://s1.swimg.net/gsmf/578/img/events/appearance.png"" width=""13"" height=""13"" alt=""Presenze"" title=""Presenze""> 6 
    <img src=""http://s1.swimg.net/gsmf/578/img/events/G.png"" width=""13"" height=""13"" alt=""Goal"" title=""Goal""> 0 
</div>";
var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(raw);
var query = "//img[1]/following-sibling::text()[1]";
var txt = document.DocumentNode.SelectSingleNode(query);
Console.WriteLine(Convert.ToInt32(txt.InnerText.Trim()));

dotnetfiddle

output :

6