Saket Sinha Saket Sinha - 1 year ago 110
HTML Question

How to use :empty pseudo selector in jsoup

I want to select the div tag that has no more div or any other tag.
i tried the below code and i want the output as "This is output"
but empty pseudo-selector isn't working.

String htmlString =
"<html><div><div><div><p><b>This is first line</b></p> </div><b>This is second line</b></div><div>This is output</div><div><span style=\"color:blue\">This is third line</span></div></html>"`;

org.jsoup.nodes.Document doc1 = Jsoup.parse(htmlString);

Elements elements1 ="html:empty");

for (Element element : elements1) {

Answer Source

Since you posted a couple of similar questions recently, where your html structure changed and the css selector broke, maybe it would be better/more suiting for you, to avoid the css selectors and process/filter the elements yourself:

String htmlString = "<html><p><b>This has no div</b></p><div><div><div><p><b>This is first line</b></p></div><b>This is second line</b></div><div>This is output</div><div><span style=\"color:blue\">This is third line</span></div></html>";

Document doc = Jsoup.parse(htmlString);

Elements elements = doc.getAllElements();

// for all textnodes
for (Element element : elements) {
    if(element.childNodes().size()>0 && element.childNode(0).nodeName().equals("#text")){
        Element divContent = element;

            System.out.println("No element in div; text: " + element.text()+ "\n");
            while(divContent.parents().size()>0 && !divContent.parent().nodeName().equals("div")){
                divContent = divContent.parent();
                    continue outerloop; // continue, to skip element <p><b>This has no div</b></p>
                    //break; // break, if you want the element <p><b>This has no div</b></p> anyway 

            System.out.println("element: " + divContent.toString());
            System.out.println("text: " + element.text() + "\n");

// only for <div>text...</div>
for (Element element : elements) {
    if(element.childNodes().size()>0 && element.childNode(0).nodeName().equals("#text") && element.nodeName().equals("div")){
        System.out.println("text: " + element.text());


element: <p><b>This is first line</b></p>
text: This is first line

element: <b>This is second line</b>
text: This is second line

No element in div; text: This is output

element: <span style="color:blue">This is third line</span>
text: This is third line

text: This is output
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download