Mahen dran Mahen dran - 1 month ago 11
HTML Question

How can I remove those html elements, while retain the formatting?

I have try to implement the java mail api to read body of the message and store it into text file if it contains contents.

I can able to read the body of the message but it comes with some html elements.

I have added below code in which I have used.

Properties props = System.getProperties();
props.setProperty("mail.store.protocol", "imaps");

Session session = Session.getDefaultInstance(props, null);
Store store = session.getStore("imaps");
store.connect("hostname", "username", "password");
String result = null;
Folder inbox = store.getFolder("Inbox");
inbox.open(Folder.READ_ONLY);
javax.mail.Message messages[]=inbox.search(new FlagTerm(new Flags(Flag.SEEN), false));
for(Message message:messages) {
System.out.println(Jsoup.parse(message).text());
}


How can I remove those html elements in retrieved message?

Please anyone help me to solve this.

Answer

To remove all HTML tags in your mail use the jsoups text() method.

Example Code

String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";

System.out.println(Jsoup.parse(htmlString).text());

Output

Hi Data is written in this mail.

If specific elements should be result in line-breaks similar to the rendered HTML source, you could add line-breaks and then avoid pretty printing it, when you jsoups' clean method.

prettyPrint

If disabled, the HTML output methods will not re-format the output, and the output will generally look like the input.

Example Code

String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";

htmlString = htmlString.replaceAll("<br>", System.getProperty("line.separator") + "<br>"); // do replacements for all tags that should result in line-breaks

Document.OutputSettings settings = new OutputSettings();
settings.prettyPrint(false); // to keep line-breaks

String cleanedSource = Jsoup.clean(htmlString, "", Whitelist.none(), settings);

System.out.println(cleanedSource);

Output

 Hi



 Data is written in this mail.
[... four more empty lines]
Comments