joe90 joe90 - 4 months ago 7
Java Question

Fast, lightweight XML parser

I have a specific format XML document that I will get pushed. This document will always be the same type so it's very strict.

I need to parse this so that I can convert it into JSON (well, a slightly bastardized version so someone else can use it with DOJO).

My question is, shall I use a very fast lightweight (no need for SAX, etc.) XML parser (any ideas?) or write my own, basically converting into a StringBuffer and spinning through the array? Basically, under the covers I assume all HTML parsers will spin thru the string (or memory buffer) and parse, producing output on the way through.

Thanks

edit

The xml will be between 3/4 lines to about 50 max (at the extreme)..

Answer

No, you should not try to write your own XML parser for this.

SAX itself is very lightweight and fast, so I'm not sure why think it's too much. Also using a string buffer would actually be much less scalable then using SAX because SAX doesn't require you to load the whole XML file into memory to use it. I've used SAX to parse through multigigabyte XML files, which you wouldn't be able to do using string buffers on a 32 bit machine.

If you have small files and you don't need to worry about performance, look into using the DOM. Java's implementation can be kind of annoying to use (You create a document by using a DocumentBuilder, which comes from a DocumentBuilderFactory)

The code to create a document from a file looks like this:

Document d = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new FileInputStream("file.xml"));

(note that keeping a reference to your document builder will speed things up if you need to parse multiple files)

Then you use the function in org.w3c.dom.Document to read or manipulate the contents. For example getElementsByTagName() returns all the Elements with a certain tag name.