EyfI EyfI - 1 month ago 9
Java Question

Extract text from only some divs in the same class with jsoup

I would like to extract a text from specific

<div>
of a website using jsoup, but I'm not sure how.

The problem is, that I want to get a text from div that has a
class="name"
.

But, there can be more
<div>
s with this class (and I don't want to get the text from those).

It looks like this in the HTML file:

.
.
<div class="name">
Some text I don't want
<span class="a">Tree</span>
</div>
.
.
<div class="name">Some text I do want</div>
.
.


So the only difference there is that the I want the text from does not have inside of it. But I have not found a way to use that as a key to extract the text in jsoup.

Is it possible?

Answer

Use JSoup's selector syntax. For instance to select all div's with class = "name" use

Elements nameElements = doc.select("div.name");

Note that your text you "do" and "don't" want above are in the same relative HTML locations, and in fact I have no clue why you want one or the other. HTML and JSoup will see them the same.

If you want to avoid elements containing span elements, then one way is to iterate through the elements obtained above and test by selector if they have span elements or not:

    Elements nameElements = doc.select("div.name");

    for (Element element : nameElements) {
        if (element.select("span").isEmpty()) {
            System.out.println("No span");
            System.out.println(element.text());
            System.out.println();
        } else {
            System.out.println("span");
            System.out.println(element.text());
            System.out.println();
        }
    }
Comments