ybakos ybakos - 7 months ago 59
Ruby Question

How does one properly validate an xml file with a local dtd file using Nokogiri?

I have a simple, valid DTD and a valid XML file that seems to conform to the DTD, but Nokogiri is generating a lot of validation output, meaning that the XML file fails the validation.

The dtd file is:

<!ELEMENT protocol (copyright?, description?, interface+)>
<!ATTLIST protocol name CDATA #REQUIRED>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT interface (description?,(request|event|enum)+)>
<!ATTLIST interface name CDATA #REQUIRED>
<!ATTLIST interface version CDATA #REQUIRED>
<!ELEMENT request (description?,arg*)>
<!ATTLIST request name CDATA #REQUIRED>
<!ATTLIST request type CDATA #IMPLIED>
<!ATTLIST request since CDATA #IMPLIED>
<!ELEMENT event (description?,arg*)>
<!ATTLIST event name CDATA #REQUIRED>
<!ATTLIST event since CDATA #IMPLIED>
<!ELEMENT enum (description?,entry*)>
<!ATTLIST enum name CDATA #REQUIRED>
<!ATTLIST enum since CDATA #IMPLIED>
<!ATTLIST enum bitfield CDATA #IMPLIED>
<!ELEMENT entry (description?)>
<!ATTLIST entry name CDATA #REQUIRED>
<!ATTLIST entry value CDATA #REQUIRED>
<!ATTLIST entry summary CDATA #IMPLIED>
<!ATTLIST entry since CDATA #IMPLIED>
<!ELEMENT arg (description?)>
<!ATTLIST arg name CDATA #REQUIRED>
<!ATTLIST arg type CDATA #REQUIRED>
<!ATTLIST arg summary CDATA #IMPLIED>
<!ATTLIST arg interface CDATA #IMPLIED>
<!ATTLIST arg allow-null CDATA #IMPLIED>
<!ATTLIST arg enum CDATA #IMPLIED>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description summary CDATA #REQUIRED>


The xml file is:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE protocol SYSTEM "wayland.dtd">
<protocol name="wayland">

<copyright>
FOO
SOFTWARE.
</copyright>

<interface name="wl_display" version="1">
<description summary="core global object">
The core global object. This is a special singleton object. It
is used for internal Wayland protocol features.
</description>

<request name="sync">
<description summary="asynchronous roundtrip">
The sync request asks the server to emit the 'done' event
on the returned wl_callback object. Since requests are
handled in-order and events are delivered in-order, this can
be used as a barrier to ensure all previous requests and the
resulting events have been handled.

The object returned by this request will be destroyed by the
compositor after the callback is fired and as such the client must not
attempt to use it after that point.

The callback_data passed in the callback is the event serial.
</description>
<arg name="callback" type="new_id" interface="wl_callback"/>
</request>
</interface>

</protocol>


My simple Ruby program is:

require 'nokogiri'

DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"

dtd_doc = Nokogiri::XML::Document.parse(open(DTD_PATH))
dtd = Nokogiri::XML::DTD.new('protocol', dtd_doc)
doc = Nokogiri::XML(open(XML_PATH))
puts dtd.validate(doc)


The program prints the contents of the validation array, which isn't empty. Sample output:

No declaration for attribute name of element request
No declaration for element description
No declaration for attribute summary of element description


Even after adding a
DOCTYPE
declaration to the xml file a la:

<!DOCTYPE protocol SYSTEM "wayland.dtd">


And wrapping the DTD with:

<!DOCTYPE protocol [
...
]>


I still observe the same failed validation output. What am I doing wrong?

Answer

You can do the validation by specifying the ParseOptions. You need to specify the doctype with doctype declaration <!DOCTYPE protocol SYSTEM "wayland.dtd">

require 'nokogiri'

DTD_PATH = "wayland.dtd"
XML_PATH = "wayland.xml"

xml = File.read(XML_PATH)
options = Nokogiri::XML::ParseOptions::DTDVALID
doc = Nokogiri::XML::Document.parse(xml, nil, nil, options)
puts doc.external_subset.validate(doc)
Comments