I am writing a Java program to read and XML file, actually an iTunes library which is XML plist format.
I have managed to get round most obstacles that this format throws up except when it encounters text containing the
There is something fishy about what you are trying to do.
If the file format you are trying to parse contains bare ampersand (
&) characters then it is not well-formed XML. Ampersands are represented as character entities (e.g.
&) in well-formed XML.
If it is really supposed to be real XML, then there is a bug in whatever wrote / generated the file.
If it is not supposed to be real XML (i.e. those ampersands are not a mistake), then you probably shouldn't by trying to parse it using an XML parser.
Ah, I see. The XML is actually correctly encoded, but you didn't get the SO markup right.
It would appear that your real problem is that your
characters(...) callback is being called separately for the text before the
&, for the (decoded)
&, and finally for the text after the
&. You simply have to have to deal with this by joining the text chunks back together.
The javadoc for
ContentHandler.characters() says this:
"The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks ...".