Mayuyukirin Mayuyukirin - 2 months ago 5
Vb.net Question

Using HTMLAgilitypack to get data

<ol class="list-data-b">
<li class="in-ttl-b">(a) kanji; a Chinese character [ideograph]
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">漢字で書く</span></li><li class="text-jeen text-c">write in <i>kanji</i> [<i>Chinese characters</i>]</li></ul>
<ul class="list-data-b-in"><li class="text-jejp text-c"><span class="ex">常用漢字</span></li><li class="text-jeen text-c"><i>Chinese characters</i> for everyday use (in Japan)</li></ul>
</li>
</ol>


I have HTML like that, how can i get a part of data:


  • (a) kanji; a Chinese character [ideograph]

  • 漢字で書く

  • write in kanji [Chinese characters]

  • 常用漢字

  • Chinese characters for everyday use (in Japan)



This is my code.

Dim node2 = HTMLDoc.DocumentNode.SelectNodes("//ul[@class='list-data-b-in']")
If node2 IsNot Nothing Then
For Each node In node2
Dim Japnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jejp text-c']")
txtMean.AppendText(Japnodes.InnerText)
txtMean.AppendText(vbNewLine)
Dim Engnodes As HtmlAgilityPack.HtmlNode = node.SelectSingleNode("//li[@class='text-jeen text-c']")
txtMean.AppendText(Engnodes.InnerText)
txtMean.AppendText(vbNewLine)
Next

Answer

Selecting the first text can be done as explained in your previous question. Now to get each pair of Chinese/Japanese-English texts, you can iterate through the ul elements, and then, from each ul, get the two elements that contain the target text.

Here is a console application demo :

Dim lis = HTMLDoc.DocumentNode.SelectNodes("//li[@class='in-ttl-b']")
For Each li As HtmlNode in lis 
    Dim txt = li.SelectSingleNode("text()[1]")
    Console.WriteLine(txt.InnerText)
    For Each ul As HtmlNode in li.SelectNodes("ul")
        Dim japNode = ul.SelectSingleNode("li/span")
        Dim engNode = ul.SelectSingleNode("li[@class='text-jeen text-c']")

        Console.WriteLine(japNode.InnerText)
        Console.WriteLine(engNode.InnerText)
    Next
Next

dotnetfiddle demo

output :

(a) kanji; a Chinese character [ideograph]

漢字で書く
write in kanji [Chinese characters]
常用漢字
Chinese characters for everyday use (in Japan)