Kasun Siyambalapitiya Kasun Siyambalapitiya - 1 year ago 82
Java Question

Inserting special characters like "#" in enumerated attribute values in XML DTD

I have the following

xml.dtd
file

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT aliens (alien+,alienTesting)>
<!ELEMENT alien (name,from,middleName?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT middleName (#PCDATA)>

<!--defining element attributes -->


<!ATTLIST alien aid ID #REQUIRED>
<!ATTLIST alien bioType CDATA #IMPLIED>

<!ATTLIST alien lang (Java|C|Python) "Java">

<!ELEMENT alienTesting (alienT*)>
<!ELEMENT alienT (#PCDATA)>


and here is the
xml
file

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE aliens SYSTEM "AleinDTD.dtd">

<aliens>

<alien aid="a01">
<name>Kasun </name>
<from>Northwest</from>
</alien>

<alien aid="a02">
<name>Madu</name>
<from>south</from>

</alien>

<alienTesting>
<alienT></alienT>

</alienTesting>

</aliens>


What I want is to have
Java
,
C#
,
Python
in the enumerated attributes.
So when I change it as below

<!ATTLIST alien lang (Java|C#|Python) "Java">


It gives me a error as


The enumerated type list must end with ')' in the "lang" attribute declaration


How to fix this, Thanks in advance

Answer Source

I'm afraid it won't be possible. Having a look at XML Specification, ยง3.3.1 Attribute types, the enumerated values should be Nmtoken, where the characters allowed are listed here:

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Grossely, you are allowed to use numbers and letters (from any language), hyphens, dots, underscores however not spaces, # ( ) [ ] | and other punctuation marks.