Mustafa Motani Mustafa Motani - 5 months ago 19
Java Question

Recognizing RSS Links in HTML source code

Is there way of recognizing RSS links from HTML text code. I need to code in JAVA to extract this link from HTML source code but I couldn't find a single way how different websites embed their RSS Link in HTML code of the web. Some websites use "type=application/rss+xml" but not all for e.g discovery.com and cnn.com. Is there any way I can code for any website?

Answer

I have solved my problem for the time being but I would appreciate if anyone can tell me more concise and efficient code. I think my code is not that efficient and its just a big loop to fix the small problem. My first part of code is take from user911236's post on stackoverflow.

MY CODE:

public static String RSSLinkRetriever(String url) {

    String rssUrl = "";    
    try{
          Document doc = Jsoup.connect(url).get();

          Elements links = doc.select("link[type=application/rss+xml]");
          if (links.size() > 0) {
              rssUrl = links.get(0).attr("abs:href").toString();
          }
          else if(rssURLNews(url) != ""){

            rssUrl = rssURLNews(url);
          }
          else if(rssURLrss(url) != "")
              rssUrl = rssURLrss(url);
          else{
                rssUrl = "No URL found";
            }
        }
        catch (IOException ex) {
          Logger.getLogger(RSSReader.class.getName()).log(Level.SEVERE, null, ex);
        }

        return rssUrl;

}

public static String rssURLNews(String url){

    Document doc;
    String str = "";
    try {
        doc = Jsoup.connect(url).get();

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {
            if(link.text().equals("News")){
                str = RSSLinkRetriever(link.attr("href"));
            }
         }

    } catch (IOException e) {
        e.printStackTrace();
    }
    return str;
}

public static String rssURLrss(String url) {

    Document doc;
    String str = "";
    try {
        doc = Jsoup.connect(url).get();

        // get all links
        Elements links = doc.select("a[href]");
        for (Element link : links) {
            if(link.text().equals("RSS")){
                str = link.attr("href");
            }
         }

    } catch (IOException e) {
        e.printStackTrace();
    }
    return str;
}