Kasun Siyambalapitiya Kasun Siyambalapitiya - 1 month ago 10
Java Question

Inserting special characters like "#" in enumerated attribute values in XML DTD

I have the following

xml.dtd
file

<?xml version="1.0" encoding="UTF-8"?>

<!ELEMENT aliens (alien+,alienTesting)>
<!ELEMENT alien (name,from,middleName?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT middleName (#PCDATA)>

<!--defining element attributes -->


<!ATTLIST alien aid ID #REQUIRED>
<!ATTLIST alien bioType CDATA #IMPLIED>

<!ATTLIST alien lang (Java|C|Python) "Java">

<!ELEMENT alienTesting (alienT*)>
<!ELEMENT alienT (#PCDATA)>


and here is the
xml
file

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE aliens SYSTEM "AleinDTD.dtd">

<aliens>

<alien aid="a01">
<name>Kasun </name>
<from>Northwest</from>
</alien>

<alien aid="a02">
<name>Madu</name>
<from>south</from>

</alien>

<alienTesting>
<alienT></alienT>

</alienTesting>

</aliens>


What I want is to have
Java
,
C#
,
Python
in the enumerated attributes.
So when I change it as below

<!ATTLIST alien lang (Java|C#|Python) "Java">


It gives me a error as


The enumerated type list must end with ')' in the "lang" attribute declaration


How to fix this, Thanks in advance

Answer

I'm afraid it won't be possible. Having a look at XML Specification, ยง3.3.1 Attribute types, the enumerated values should be Nmtoken, where the characters allowed are listed here:

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names because they are more useful as delimiters in contexts where XML names are used outside XML documents; providing this group gives those contexts hard guarantees about what cannot be part of an XML name. The character #x037E, GREEK QUESTION MARK, is excluded because when normalized it becomes a semicolon, which could change the meaning of entity references.

[4] NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] | [#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] | [#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] | [#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] | [#x10000-#xEFFFF]

[4a] NameChar ::= NameStartChar | "-" | "." | [0-9] | #xB7 | [#x0300-#x036F] | [#x203F-#x2040]

Grossely, you are allowed to use numbers and letters (from any language), hyphens, dots, underscores however not spaces, # ( ) [ ] | and other punctuation marks.