neptune neptune - 1 month ago 10
Java Question

How to parse multiple, consecutive xml files in one document?

I have a big text file that is a sequence of XML-valid documents that looks something like this:

<DOC>
<TEXT> ... </TEXT>
...
</DOC>
<DOC>
<TEXT> ... </TEXT>
...
</DOC>


etc. There is no
<?xml version="1.0">
, the
<DOC></DOC>
delimits each separate xml. What's the best way to parse this in Java and get the values under
<TEXT>
in each
<DOC>
?

If I pass the whole thing to a DocumentBuilder, I get an error saying the document is not well formed. Is there a better solution than simply traversing through, a building a string for each
<DOC>
?

Answer

A valid XML document must have a root element under which you can specify all other elements. Also, in a document only ONE root element can be present. have a look on XML Specification (see point 2)

So, to overcome your issue, you can take all the content of your text file into a String (or StringBuffer/StringBuilder...) And put this string in between <root> and </root> tags e.g ,

String origXML = readContentFromTextFile(fileName);
String validXML = "<root>" + origXML + "</root>";
//parse validXML