Aparna I Aparna I - 1 month ago 9
HTML Question

Avoid removal of spaces and newline while parsing html using jsoup

I have a sample code as below.

String sample = "<html>
<head>
</head>
<body>
This is a sample on parsing html body using jsoup
This is a sample on parsing html body using jsoup
</body>
</html>";

Document doc = Jsoup.parse(sample);
String output = doc.body().text();


I get the output as

This is a sample on parsing html body using jsoup This is a sample on `parsing html body using jsoup`


But I want the output as

This is a sample on parsing html body using jsoup
This is a sample on parsing html body using jsoup


How do parse it so that I get this output? Or is there another way to do so in Java?

Answer

You can disable the pretty printing of your document to get the output like you want it. But you also have to change the .text() to .html().

Document doc = Jsoup.parse(sample);
doc.outputSettings(new Document.OutputSettings().prettyPrint(false));
String output = doc.body().html();
Comments