user818455 user818455 - 1 month ago 14
HTML Question

How to extract text outside of a html tag using jsoup?

I have the following HTML code:

Data


<div class="alg"></div>
<div class="alg"></div>





Pepsi
791



<div class="alg"></div>
<div class="alg"></div>


Coke
700



<div class="gap"></div>
<div class="gap"></div>


I want to extract all values Coke,700,pepsi,791. I tried the following code using Jsoup:

Document doc = Jsoup.parse(html);

for( Element element : doc.select("div.alg") ) // Select all the div tags
{
TextNode next = (TextNode) element.nextSibling(); // Get the next node of each div as a TextNode

System.out.println(next.text()); // Print the text of the TextNode
}


But the above code always print "" empty string.

Answer

Try this:

Document doc = Jsoup.parse(url, 30000);
for( Element element : doc.select(".gap") ) { // Select all the div tags
    Node next = element.nextSibling();
    StringBuffer sb = new StringBuffer();
    while (next instanceof TextNode) {
        sb.append(((TextNode)next).text());
        next = next.nextSibling();
    }
    System.out.println(sb.toString()); // Print the text of the TextNode
}
Comments