Gastón Schabas Gastón Schabas - 1 year ago 90
Java Question

jsoup join some nodes and wrap it in an element

I'm new at Jsoup. I'm trying to modify the following example.

text that <string>need</strong> to be <strong>wrapped</strong>
<p>a text that has to be ignored</p>
another text that <string>need</strong> to be <strong>wrapped</strong>

to obtain this

<p>text that <string>need</strong> to be <strong>wrapped</strong></p>
<p>a text that has to be ignored</p>
<p>another text that <string>need</strong> to be <strong>wrapped</strong></p>

so, I need to wrap all texts that are not inside a <p> with a <p>

I've tryed something like this

Document doc = Jsoup.parse(html);
doc.body().traverse(new NodeVisitor() {
public void head(Node node, int depth) {
if(node instanceof TextNode && Arrays.asList("div","body").contains(node.parentNode().nodeName())) {
Node auxNode = node;

while (auxNode.nextSibling() != null && Arrays.asList("em", "strong").contains(auxNode.nextSibling().nodeName())) {
auxNode = node.nextSibling();

public void tail(Node node, int depth) { }

But I just keep getting a NullPointerException in the while condition.

Thanks in advance

at HTMLToArticleParser$1.head(
at org.jsoup.nodes.Node.traverse(
at HTMLToArticleParser.parse(
at HTMLToArticleParser_Tests.jTest(
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(
at sun.reflect.DelegatingMethodAccessorImpl.invoke(
at java.lang.reflect.Method.invoke(
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
at org.junit.runners.model.FrameworkMethod.invokeExplosively(
at org.junit.internal.runners.statements.InvokeMethod.evaluate(
at org.junit.internal.runners.statements.RunBefores.evaluate(
at org.junit.runners.ParentRunner.runLeaf(
at org.junit.runners.BlockJUnit4ClassRunner.runChild(
at org.junit.runners.BlockJUnit4ClassRunner.runChild(
at org.junit.runners.ParentRunner$
at org.junit.runners.ParentRunner$1.schedule(
at org.junit.runners.ParentRunner.runChildren(
at org.junit.runners.ParentRunner.access$000(
at org.junit.runners.ParentRunner$2.evaluate(
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(
at com.intellij.rt.execution.junit.JUnitStarter.main(

Answer Source

thanks to everyone. I could solve it doing this

class NewNode

public class NewNode {

    private Element newElement = new Element(Tag.valueOf("p"), "");
    private List<Node> childs;

    public NewNode(List<Node> childs) {
        this.childs = childs;

    public Node getNewNode() {
        childs.forEach(child -> newElement.appendChild(child.clone()));
        return newElement;


class NodesToProcess

public class NodesToProcess {

    private Node oldNode;
    private NewNode newNode;
    private List<Node> toRemove;
    public NodesToProcess(Node oldNode, NewNode newNode, List<Node> toRemove) {
        this.oldNode = oldNode;
        this.newNode = newNode;
        this.toRemove = toRemove;

    public Node getOldNode() {
        return oldNode;

    public Node getNewNode() {
        return newNode.getNewNode();

    public List<Node> getToRemove() {
        return toRemove;


and this method is the one who wrap text that are not wrapped

private void wrapUnwrappedTextInTagP(Element element) {
    List<NodesToProcess> nodesToProcesses = new ArrayList<>();
    List<Node> nodeAlreadyUsed = new ArrayList<>();

    element.childNodes().forEach(node -> {
        if(node instanceof TextNode && !nodeAlreadyUsed.contains(node)) {
            List<Node> newChilds = new ArrayList<>();
            List<Node> toRemove = new ArrayList<>();

            Node auxNode = node.nextSibling();

            while (auxNode != null && parentIsBodyAndIsAnTextElement(auxNode)) {
                auxNode = auxNode.nextSibling();
            nodesToProcesses.add(new NodesToProcess(node, new NewNode(newChilds), toRemove));

    nodesToProcesses.forEach(nodesToProcess -> {
        nodesToProcess.getToRemove().forEach(node -> node.remove());

so, in the main method

Document doc = Jsoup.parse(html);
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download