Helen Helen - 27 days ago 16
Java Question

Get data from table(html) except div tag by jsoup

I have html code:

<table width="100%" cellpadding="5" cellspacing="2" class="zebra">
<tr>
<td colspan="5">
<div class="paginator">
<a href="http://some_link">2</a>&nbsp;
</div>
</td>
</tr>
<tr>
<td><a href="//i_need_only_this_link">some_value</a></td>
</tr>
<tr>
<td><a href="//i_need_only_this_link1">some_value</a></td>
</tr>
<tr>
<td colspan="2">
<div class="paginator">
<a href="http://some_link">2</a>&nbsp;
</div>
</td>
</tr>
</table>


I use Jsoup. How I can get all links except links in div tag?
I try to do something like this, but It doesn't work. Element contains all the links.

org.jsoup.nodes.Elements tableText = doc.select("table.zebra").not("tr td div.paginator");

for (org.jsoup.nodes.Element td : tableText.select("td a")) {
System.out.println(td.attr("href")); // http://some_link
....
}

Answer

You can use the below code..

Document html = Jsoup.parse(htmlStr);

    for (Element e : html.getElementsByTag("a")) {

        if (!"div".equalsIgnoreCase(e.parentNode().nodeName())) {
            System.out.println(e.attr("href"));
        }

    }

Here I am checking that the parent node of the anchor element is not div. if it is not div I am printing the url.