alexgophermix alexgophermix - 1 year ago 99
Android Question

Absolute URL incorrect when converted from relative URL Android JSoup

I'm trying to parse out navigation links from various sites.

I've been having issues with one particular site which uses a relative format prefixed with

Here is the code snippet with relevant param values in comments:

// url =
// selector = ".next a"
// ele = <a href="./strip/1457">Next</a>
// attr = "href"
Element ele =;
String absoluteUrl = ele.absUrl(attr).trim().replaceAll("\n", "");

Jsoup returns:

when in fact the real link is:

From my understanding Jsoup is giving the correct link here as
refers to the current directory (
) meaning that the anchor is done incorrectly on the site. However Chrome, Firefox and IE all resolve the relative URL to point to the next strip instead of
. Is there any way I can correct for this behaviour without breaking relative URLs in other cases?

Answer Source

The problem:

If you have a look at the header of the html source, you will find:

    <base href="" />

What does it mean?

For all relative urls in the document, this will be used as the base (so this is the current directory ./). See:


Jsoup allready detects the <base> tag and ele.absUrl("href") would (and does, just tested it) return but you are overriding the correct settings with ele.setBaseUri(url);, so remove this line of code.

If you want to handle setting the correct base yourself, just parse the head for a <base> element:

String url = "";

Element base ="head > base[href]").first();

String baseUrl = base!=null ? base.attr("href") : url;

Element ele ="#comic > div > > ul > > a").first();

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download