rebecca rebecca - 6 months ago 55
Java Question

JAVA SAX parser split calls to characters()

I am doing a project to parse some data from the XML.

For example, the XML is

<abc>abcdefghijklmno</abc>


I need to parse "abcdefghijkmnlp".

But while I test my parse, I discover a big problem:

public class parser{
private boolean hasABC = false;


//Constructor HERE
......................
......................

@Override
public void startDocument () throws SAXException{
}

@Override
public void endDocument () throws SAXException{
}

@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) throws SAXException{
if ("abc".equalsIgnoreCase(localName)) {
this.hasABC = true;
}
}
@Override
public void endElement(String namespaceURI, String localName, String qName) throws SAXException{
if ("abc".equalsIgnoreCase(localName)) {
this.hasABC = false;
}
}
@Override
public void characters(char ch[], int start, int length){
String content = new String(ch, start, length).trim();
if(this.hasABC){
System.out.println("ABC = " + content);
}
}
}


I discover that the parser has parsed the tag two time
System print out is,

ABC = abcdefghi

ABC = jklmno <<============ split the message

Why the parser auto call back the characters() two time????

Is the XML haveing some "\n" or "\r" ???

Answer

Parser is calling characters method more than one time, because it can and allowed per spec. This helps fast parser and keep their memory footprint low. If you want a single string create a new StringBuilder object in the startElement and process it on endElement method.

Comments