phil swenson phil swenson - 5 months ago 16
Java Question

How do I extract child element from XML to a string in Java?

If I have an XML document like

<root>
<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>
</root>


I want to get an XML string with the first child element. My output string would be

<element1>
<child attr1="blah">
<child2>blahblah</child2>
<child>
</element1>


There are many approaches, would like to see some ideas. I've been trying to use Java XML APIs for it, but it's not clear that there is a good way to do this.

thanks

Answer

You're right, with the standard XML API, there's not a good way - here's one example (may be bug ridden; it runs, but I wrote it a long time ago).

import javax.xml.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.stream.*;
import org.w3c.dom.*;
import java.io.*;

public class Proc
{
    public static void main(String[] args) throws Exception
    {
        //Parse the input document
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(new File("in.xml"));

        //Set up the transformer to write the output string
        TransformerFactory tFactory = TransformerFactory.newInstance();
        Transformer transformer = tFactory.newTransformer();
        transformer.setOutputProperty("indent", "yes");
        StringWriter sw = new StringWriter();
        StreamResult result = new StreamResult(sw);

        //Find the first child node - this could be done with xpath as well
        NodeList nl = doc.getDocumentElement().getChildNodes();
        DOMSource source = null;
        for(int x = 0;x < nl.getLength();x++)
        {
            Node e = nl.item(x);
            if(e instanceof Element)
            {
                source = new DOMSource(e);
                break;
            }
        }

        //Do the transformation and output
        transformer.transform(source, result);
        System.out.println(sw.toString());
    }
}

It would seem like you could get the first child just by using doc.getDocumentElement().getFirstChild(), but the problem with that is if there is any whitespace between the root and the child element, that will create a Text node in the tree, and you'll get that node instead of the actual element node. The output from this program is:

D:\home\tmp\xml>java Proc
<?xml version="1.0" encoding="UTF-8"?>
<element1>
        <child attr1="blah">
           <child2>blahblah</child2>
       </child>
   </element1>

I think you can suppress the xml version string if you don't need it, but I'm not sure on that. I would probably try to use a third party XML library if at all possible.