diborbi diborbi - 7 months ago 20
Java Question

stax xml confusion with getname function

I have a xml file like this:

<comment type="PTM">
<text evidence="19">Sumoylated following its interaction with PIAS1 and UBE2I.</text>
</comment>
<comment type="PTM">
<text evidence="17">Ubiquitinated, leading to proteasomal degradation.</text>
</comment>
<comment type="disease">
<text>A chromosomal aberration involving ZMYND11 is a cause of acute poorly differentiated myeloid leukemia. Translocation (10;17)(p15;q21) with MBTD1.</text>
</comment>
<comment type="disease" evidence="23">
<disease id="DI-04257">
<name>Mental retardation, autosomal dominant 30</name>
<acronym>MRD30</acronym>
<description>A disorder characterized by significantly below average general intellectual functioning associated with impairments in adaptive behavior and manifested during the developmental period. MRD30 patients manifest mild intellectual disability and subtle facial dysmorphisms, including hypertelorism, ptosis, and a wide mouth.</description>
<dbReference type="MIM" id="616083"/>
</disease>
<text>The disease is caused by mutations affecting the gene represented in this entry.</text>
</comment>
<comment type="similarity">
<text evidence="8">Contains 1 bromo domain.</text>
</comment>
<comment type="similarity">
<text evidence="9">Contains 1 MYND-type zinc finger.</text>
</comment>


I use stax to extract the disease information. This is part of my code:

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLEventReader eventReader = factory.createXMLEventReader( new FileReader(p));

while(eventReader.hasNext()){
XMLEvent event = eventReader.nextEvent();
switch(event.getEventType()){
case XMLStreamConstants.START_ELEMENT:
StartElement startElement = event.asStartElement();
String qName = startElement.getName().getLocalPart();
if (qName.equalsIgnoreCase("comment")) {
System.out.println("Start Element : comment");
Iterator<Attribute> attributes = startElement.getAttributes();
Attribute a = attributes.next();
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();
System.out.println("Roll No : " + type);
} else if(qName.equalsIgnoreCase("text") && type.equals("disease")){ text = true; }

break;

case XMLStreamConstants.CHARACTERS:
Characters characters = event.asCharacters();
if(text){ res = res + " " + characters.getData();
//System.out.println("TEXT: " + res);
text = false;
}
break;

case XMLStreamConstants.END_ELEMENT:
EndElement endElement = event.asEndElement();
if(endElement.getName().getLocalPart().equalsIgnoreCase("comment")){
//System.out.println("End Element : comment");
//System.out.println();
}
break;


For this type of line:

<comment type="disease">


I can extract the info correctly, but when I try to find comment type "disease" in this line:

<comment type="disease" evidence="23">


it gives me type=evidence and not type=disease as it should be. Therefore it doesn't save anything from this kind of line.

ug_ ug_
Answer

First of all can we please get in the habit of using useful variable names, you have the following variables with their type: a(node), text(boolean), qName(String)... These variables leave me scratching my head and wondering what they are:

a - Just not a useful name, it should really be something like typeAttr or something noting that it should be the type="" attribute

text - its a boolean?! maybe collectText would be more appropriate since it designates that you should collect the next text events value.

qName - its a string which is the localPart of a QName, if its not a QName then dont name it as one..


But thats enough ranting you get the idea. Your problem lies in where you get the attribute. In XML attributes have no specific order and will not and should not be expected to return in the order which they are defined. In your code you have the following

Iterator<Attribute> attributes = startElement.getAttributes();
Attribute a = attributes.next(); 
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();

Here you get the first attribute from the element and set the type equal to its value. As I mentioned the XML attributes have no specific order so you are getting the evidence attribute. You should be getting the attribute by name:

Attribute a = startElement.getAttributeByName(QName.valueOf("type"));
System.out.println("ATRIBUTES " + a.getName());
type = a.getValue();