Rachel Rachel - 1 year ago
Java Question

How to parse a xhtml ignoring the DOCTYPE declaration using DOM parser

I face issue parsing xhtml with DOCTYPE declaration using DOM parser.

java.io.IOException: Server returned HTTP response code: 503 for URL:

Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.

Answer Source

A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)


here's kdgregory's solution:

documentBuilder.setEntityResolver(new EntityResolver()
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
                return new InputSource(new StringReader(""));