user755806 user755806 - 1 month ago 19
Java Question

SAX Parser characters method doesn't collect all content

I'm using SAX parser to parse XML and is working fine.

I have below tag in XML.

<value>•CERTASS >> Certass</value>


Here I expect '•CERTASS >> Certass' as output. but below code returns only
Certass
. Is there any issue with the special chars of
value
tag?

public void characters(char[] buffer, int start, int length) {
temp = new String(buffer, start, length);
}

Answer

It is not guaranteed that the characters() method will run only once inside an element.

If you are storing the content in a String, and the characters() method happens to run twice, you will only get the content from the second run. The second time that the characters method runs it will overwrite the contents of your temp variable that was stored from the first time.

To remedy this, use a StringBuilder and append() the contents in characters() and then process the contents in endElement(). For example:

 DefaultHandler handler = new DefaultHandler() {
     private StringBuilder stringBuilder;

     @Override
     public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
         stringBuilder = new StringBuilder();
     }

     public void characters(char[] buffer, int start, int length) {
         stringBuilder.append(new String(buffer, start, length));
     }

     public void endElement(String uri, String localName, String qName) throws SAXException {
         System.out.println(stringBuilder.toString());
     }
 };

Parsing the String "<value>•CERTASS >> Certass</value>" and the handler above gives the output:

?CERTASS >> Certass

I hope this helps.