Stephane Hatgis-Kessell Stephane Hatgis-Kessell - 11 months ago 58
Java Question

Java - How do I extract Google News Titles and Links using Jsoup?

I am very new to using jsoup and html. I was wondering how to extract the titles and links (if possible) from the stories on the front page of google news. Here is my code:

org.jsoup.nodes.Document doc = null;
try {
doc = (org.jsoup.nodes.Document) Jsoup.connect("").get();
} catch (IOException e1) {
// TODO Auto-generated catch block
Elements titles ="titletext");

System.out.println("Titles: " + titles.text());

//non existent
for (org.jsoup.nodes.Element e: titles) {
System.out.println("Title: " + e.text());
System.out.println("Link: " + e.attr("href"));

For some reason I think my program is unable to find
, since this is the output when the code runs:

I would really appreciate your help, thanks.

Answer Source

First get all nodes/elements which start with h2 html tag

Elements elem ="h2");

Now you have element it has some child element(s) (id, href, originalhref and so on). Here you need retrieve these data which you need

 for(Element e: elem){