Tim The Learner Tim The Learner - 4 months ago 16
Android Question

Trying to scrape some links with jsoup and need help selecting specific elements

So I'm fairly new to Java and android and am working on a personal project. I have 0 knowledge regarding HTML however and that is why I am struggling with jsoup. What I would like to do is to grab a link, in this case a webm file, and store it in a string. The page I'm scraping is http://www.hearthpwn.com/cards/503-ragnaros-the-firelord and the link I want is on line 1010 when viewing page source. I'd like for this method to work on different pages so I don't want to scrape by line. If someone could show me how to scrape only the link associated with "data-animationurl=" for consistency that'd be great, thanks.

Answer

You'll want to wrap this in an AsyncTask so your app doesn't hang, but this should give you a good start:

You can get more information about jsoup here.

try {
    //Connect to the url, and set the user agent so we don't get blocked out
    Connection connect = Jsoup.connect("http://www.hearthpwn.com/cards/503-ragnaros-the-firelord");
    connect.userAgent("Mozilla/5.0");

    //Get the html and select the first <video class="hscard-video" ...
    Document doc = connect.get();
    Element video = doc.select("video.hscard-video").first();

    //Grab all the data from it as a map (ex. data-href, data-usegold...)
    Map<String, String> dataSet = video.dataset();

    //If data-animationurl exists, print it (here you can store it as a String instead 
    if(dataSet.containsKey("animationurl")){
        System.out.println(dataSet.get("animationurl"));
    }
} catch (IOException e) {
    e.printStackTrace();
}