helloimyourmind helloimyourmind - 1 year ago 59
Java Question

Jsoup: take text and url

I've this HTML block:

<div class="singolo-contenuto link_azure">
<a href="http://example.com">Name of URL</a></p></p>
<ul class="list_attachments"><li><a
href="DON'T TOUCH"><img src='/img/fileicons/file.png' alt='file'/> TITLE</a></li></ul>
<div class="clear"></div>

Actually I'm taking text with:


That returns to me:
"I'm a TEXTXXXXXXXXXXXXXXXX Name of URL". Isn't possible to get "I'm a TEXTXXXXXXXXXXXXXXXX http://example.com Name of URL"?

are not always the same in all the pages.
I'm only sure that text and href will be in the ""singolo-contenuto link_azure" class.

Answer Source

You can replace all links by text as you want then call .text()

pseudo code:

for (Element elem : document.select(".singolo-contenuto a")) {
    if(elem.parents().hasClass("list_attachments")) continue;
    String href = elem.attr("href");
    String text = elem.text();
    elem.replaceWith(new TextNode(href + " " + text, ""));
String result = document.select(".singolo-contenuto").text();
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download