goat goat - 2 years ago 262
HTML Question

Jsoup - Extract from html string with the same span class name

I'm still new to html. For an Android project, I need to extract some data from an html string using jsoup. The structure is something like this. All the span tags have the same class name. And the data I need is in between each of those.

<span class="head">a</span>
xxxx data xxxx
<span class="head">b</span>
xxxx data xxxx
<span class="head">c</span>
xxxx data xxxx

Is there any way I could extract it?

Answer Source

There are 2 things you have to do:

  • select all elements that preceding the text node you are interested in,
  • use nextSibling method to get the text node.

Take a look at this sample code: import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.nodes.TextNode;

public class JsoupExample {

    public static void main(String[] args) {
        String html = "<span class=\"head\">a</span>\n" +
                "xxxx data xxxx\n" +
                "<span class=\"head\">b</span>\n" +
                "xxxx data xxxx\n" +
                "<span class=\"head\">c</span>\n" +
                "xxxx data xxxx";

        Document document = Jsoup.parse(html);

        for (Element span : document.select("span.head")) {
            TextNode node = (TextNode) span.nextSibling();

            assert "xxxx data xxxx".equals(node.text());


It uses your input and shows both steps.

Here document.select("span.head") we select all elements with class head, then we iterate over those elements using forEach(span -> {}) function and lambda expression (this is Java 8 example). Then we get interesting text node using: TextNode node = (TextNode) span.nextSibling(); Here we just check if text node equals the value we expect by using assertion and we simply display it to standard output.

Modify this code sample for your needs. I hope it helps you.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download