Rachel Rachel - 2 months ago 18
Java Question

How to parse a xhtml ignoring the DOCTYPE declaration using DOM parser

Hi
I face issue parsing xhtml with DOCTYPE declaration using DOM parser.

Error:
java.io.IOException: Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd%20

Declaration: DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd

Is there a way to parse the xhtml to a Document object ignoring the DOCTYPE declaration.

Answer

A solution that works for me is to give the DocumentBuilder a fake Resolver that returns an empty stream. There's a good explanation here (look at the last message from kdgregory)

http://forums.sun.com/thread.jspa?threadID=5362097

here's kdgregory's solution:

documentBuilder.setEntityResolver(new EntityResolver()
        {
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
            {
                return new InputSource(new StringReader(""));
            }
        });