imoteb imoteb - 7 months ago 16
Java Question

Java Jsoup extracting "alt"

I am trying to crawl this web page: http://www.bbc.com/earth/columns/record-breakers.
When I try to get all the available links, my program returns only some part of the actual link.

As you can see in the picture, the href attribute value contains only some part of the actual link. On the website, when I move the mouse over the article, it appears some small box in the left bottom corner of the screen with the right link.

I dont have that much knowledge in HTML, but I just learned that is called the "alt" attribute, so my question is how I can get this information appearing in the left corner with Jsoup?

enter image description here

Answer

Use the abs: attribute prefix to resolve an absolute URL from an attribute. Example for the page above:

 public static void main (String []args) throws IOException {

    Document doc = Jsoup.connect("http://www.bbc.com/earth/columns/record-breakers").get();
    Elements link = doc.select("div.promo-unit-header a");      

    for(Element e : link){
        System.out.println(e.attr("abs:href"));                    
    }       

}