jethow jethow - 5 months ago 27
Java Question

Split up jSoup scraping result

I do scraping from this link using jSoup library on Java. My source works so well and I want to ask how to split every elements I get?

Here my source

package javaapplication1;

import java.sql.SQLException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

public class coba {

public static void main(String[] args) throws SQLException {
MasukDB db=new MasukDB();
try {
Document doc = null;
for (int page = 1; page < 2; page++) {
doc = Jsoup.connect("" + page).get();
System.out.println("title : " +".entry-title>a").text() + "\n");
System.out.println("link : " +".entry-title>a").attr("href") + "\n");
System.out.println("body : " + String.join("",".entry-content p").text()) + "\n");
System.out.println("date : " +".entry-date>a").text() + "\n");
} catch (IOException e) {

In the result, every page of website becomes one line, how to split it up guys? and how to get link on every article, I think my CSS selector on link side is still wrong
thanks mate

Answer Source".entry-title>a").text()

This will search the entire document and return a list of links, from which you are scraping their text node. However, you are probably wanting to scrape every article and then get the pertinent data from each.

    Document doc;
    for (int page = 1; page < 2; page++) {

        doc = Jsoup.connect("" + page).get();

        // get a list of articles on page
        Elements articles ="main#main article");

        // iterate article list
        for (Element article : articles) {

            // find the article header, which includes title and date
            Element header ="header.entry-header").first();

            // find and scrape title/link from header
            Element headerTitle ="h1.entry-title > a").first();
            String title = headerTitle.text();
            String link = headerTitle.attr("href");

            // find and scrape date from header
            String date ="div.entry-meta > span.entry-date > a").text();

            // find and scrape every paragraph in the article content
            // you probably will want to further refine the logic here
            // there may be paragraphs you don't want to include
            String body ="div.entry-content p").text();

            // view results
                            "title={0} link={1} date={2} body={3}", 
                            title, link, date, body));

See CSS Selectors for more examples on how to scrape this kind of data.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download