Ahmed Ahmed Ahmed Ahmed - 21 days ago 7
Java Question

Java: How can I use jsoup to extract headlines from a news page?

I want to get the first headline and print it. So far, I have looked through the HTML and found a way to search for the headlines.

data-pb-placeholder="Write headline here"


That code usually precedes any headline I want. So far I have...

Document doc = Jsoup.connect("http://www.washingtonpost.com").get();
Element headline = doc.select("headline").first();
System.out.println(headline);


It is only outputting null. I'm not sure how I can search through the doc and find headlines.

Answer Source

It looks like the headlines are all under <div class="headline">. You can use CSS selectors to target these and extract their text node.

    Document doc = Jsoup.connect("http://www.washingtonpost.com").get();

    for (Element headline : doc.select("div.headline"))
        System.out.println(headline.text());