Mayuyukirin Mayuyukirin - 10 months ago 33 Question

Retrieve parts of text inside <li>

I have HTML like this

<li class="in-ttl-b">(a) kanji; a Chinese character [ideograph]
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">漢字で書く</span></li><li class="text-jeen text-c">write in <i>kanji</i> [<i>Chinese characters</i>]</li></ul>
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">常用漢字</span></li><li class="text-jeen text-c"><i>Chinese characters</i> for everyday use (in Japan)</li></ul>

How can I get only
kanji; a Chinese character [ideograph]

Answer Source

You can get that by selecting the first text node that is child of the outer li element. For example, assuming there can be more than one instance of li with class="in-ttl-b" :

Dim lis = HTMLDoc.DocumentNode.SelectNodes("//li[@class='in-ttl-b']")
For Each li As HtmlNode in lis 
    'select the first text node in <li> :
    Dim txt = li.SelectSingleNode("text()[1]")

dotnetfiddle demo

output :

(a) kanji; a Chinese character [ideograph]