Premraj Premraj - 7 months ago 13
Java Question

Which is the best library for XML parsing in java

I'm searching the java library for parsing XML (complex configuration and data files), I googled a bit but couldn't found other than dom4j (Seems like they are working on V2).. I have taken look at commons configuration but didn't liked it, Other apache projects on XML seems under hibernation. I haven't evaluated dom4j by myself but just wanted to know - Do java has other (Good) open source xml parsing library? and how's your experience with dom4j?

After the @Voo's answer let me ask another one - Should I use java's in built classes or any third library like dom4j.. What are the advantages?

Voo Voo
Answer

Actually Java supports 4 methods to parse XML out of the box:

DOM Parser/Builder: The whole XML structure is loaded into memory and you can use the well known DOM methods to work with it. DOM also allows you to write to the document with Xslt transformations. Example:

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setValidating(true);
    factory.setIgnoringElementContentWhitespace(true);
    try {
        DocumentBuilder builder = factory.newDocumentBuilder();
        File file = new File("test.xml");
        Document doc = builder.parse(file);
        // Do something with the document here.
    } catch (ParserConfigurationException e) {
    } catch (SAXException e) {
    } catch (IOException e) { 
    }

SAX Parser: Solely to read a XML document. The Sax parser runs through the document and calls callback methods of the user. There are methods for start/end of a document, element and so on. They're defined in org.xml.sax.ContentHandler and there's an empty helper class DefaultHandler.

    SAXParserFactory factory = SAXParserFactory.newInstance();
    factory.setValidating(true);
    try {
        SAXParser saxParser = factory.newSAXParser();
        File file = new File("test.xml");
        saxParser.parse(file, new ElementHandler());    // specify handler
    }
    catch(ParserConfigurationException e1) {
    }
    catch(SAXException e1) {
    }
    catch(IOException e) {
    }

StAx Reader/Writer: This works with a datastream oriented interface. The program asks for the next element when it's ready just like a cursor/iterator. You can also create documents with it. Read document:

    FileInputStream fis = null;
    try {
        fis = new FileInputStream("test.xml");
        XMLInputFactory xmlInFact = XMLInputFactory.newInstance();
        XMLStreamReader reader = xmlInFact.createXMLStreamReader(fis);
        while(reader.hasNext()) {
            reader.next(); // do something here
        }
    }
    catch(IOException exc) {
    }
    catch(XMLStreamException exc) {
    }

Write document:

    FileOutputStream fos = null;
    try {
        fos = new FileOutputStream("test.xml");
        XMLOutputFactory xmlOutFact = XMLOutputFactory.newInstance();
        XMLStreamWriter writer = xmlOutFact.createXMLStreamWriter(fos);
        writer.writeStartDocument();
        writer.writeStartElement("test");
        // write stuff
        writer.writeEndElement();
        writer.flush();
    }
    catch(IOException exc) {
    }
    catch(XMLStreamException exc) {
    }
    finally {
    }

JAXB: The newest implementation to read XML documents: Is part of Java 6 in v2. This allows us to serialize java objects from a document. You read the document with a class that implements a interface to javax.xml.bind.Unmarshaller (you get a class for this from JAXBContext.newInstance). The context has to be initialized with the used classes, but you just have to specify the root classes and don't have to worry about static referenced classes. You use annotations to specify which classes should be elements (@XmlRootElement) and which fields are elements(@XmlElement) or attributes (@XmlAttribute, what a surprise!)

    RootElementClass adr = new RootElementClass();
    FileInputStream adrFile = null;
    try {
        adrFile = new FileInputStream("test");
        JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
        Unmarshaller um = ctx.createUnmarshaller();
        adr = (RootElementClass) um.unmarshal(adrFile);
    }
    catch(IOException exc) {
    }
    catch(JAXBException exc) {
    }
    finally {
    }

Write document:

    FileOutputStream adrFile = null;
    try {
        adrFile = new FileOutputStream("test.xml");
        JAXBContext ctx = JAXBContext.newInstance(RootElementClass.class);
        Marshaller ma = ctx.createMarshaller();
        ma.marshal(..);
    }
    catch(IOException exc) {
    }
    catch(JAXBException exc) {
    }
    finally {
    }

Examples shamelessly copied from some old lecture slides ;-)

Edit: About "which API shoild I use?". Well it depends - not all APIs have the same capabilities as you see, but if you have control over the classes you use to map the XML document JAXB is my personal favorite, really elegant and simple solution (though I haven't used it for really large documents, it could get a bit complex). SAX is pretty easy to use too and just stay away from DOM if you don't have a really good reason to use it - old, clunky API in my opinion. I don't think there are any modern 3rd party libraries that feature anything especially useful that's missing from the stl and the standard libraries have the usual advantages of being extremely well tested, documented and stable.

Comments