eckes eckes - 11 months ago 153
Java Question

Ignore XML doctype declarations in XMLReader (XXE)

I use the non-validating read for displaying or processing un-trusted XML documents where I do not need support for internal entities but I do want to be able to process then even if a DOCTYPE is shown.

With the disallow DOCTYPE-decl feature of SAX I can make sure parsing a XML document has no risk of external entities or billion laughter DOS expansions. This is also recommended by the OWASP XXE prevention cheat-sheet.

XMLReader reader = XMLReaderFactory.createXMLReader();
reader.setFeature("", true);

reader.setFeature("", true);

// or
reader.setFeature("", false);
reader.setFeature("", false);
reader.setFeature("", false);

However unfortunately this aborts the parsing when a DOCTYPE is given:

org.xml.sax.SAXParseException; systemId: file:... ; lineNumber: 2; columnNumber: 10;
DOCTYPE is disallowed when the
feature "" set to true.

And if I ignore this fatal error, then it will happily resolve internal entities, as you can see here:

I wonder, is there a combination of features so I can read over but not evaluate the doctype declaration (especially avoiding recursive expansion).

I am looking to avoid defining my own Apache specific security-manager property or a special resolver.

Answer Source

According to core-lib-dev the XMLReaderFactory will be deprecated in Java 9 and the way to obtain a XMLReader will be to use a SAX Parser.

In that case FSP can be used (which esablishes some resource limits as well as removes remote schema handlers for ACCESS_EXTERNAL_DTD and _SCHEMA):

SAXParserFactory spf = SAXParserFactory.newInstance();
// when FSP is activated explicit it will also restrict external entities
spf.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
XMLReader reader = spf.newSAXParser().getXMLReader();