schlechtums schlechtums - 9 days ago 6
C# Question

XSD validation not failing trailing newline

Xml validation is not something I touch except when I have to, so there's probably something stupid I'm missing and so far I've been unsuccessful in googling for any help. My issue is I have a type with a restriction that says it can only be letters or spaces. An element with a leading newline fails validation, but a trailing newline passes. How do I get the trailing newline to fail?

I've created a stripped down test case as follows:

Validation Code:

public List<XsdValidationError> ValidateXmlAgainstXsd(String xml, String xsdFilePath, Boolean processSchemaLocation = false)
{
var ret = new List<XsdValidationError>();
var xss = new XmlSchemaSet();

var xmlUrlResolver = new XmlUrlResolver();
xmlUrlResolver.CachePolicy = new RequestCachePolicy(RequestCacheLevel.Default);
xss.XmlResolver = xmlUrlResolver;

var xsdXElement = XElement.Parse(File.ReadAllText(xsdFilePath));

var targetNamespaceAttribute = xsdXElement.Attribute("targetNamespace");
xss.Add(targetNamespaceAttribute != null ? targetNamespaceAttribute.Value : "", xsdFilePath);


var settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.Schema;
settings.Schemas = xss;
settings.ValidationFlags = XmlSchemaValidationFlags.ProcessInlineSchema;

if (processSchemaLocation)
settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
settings.ValidationEventHandler += (sender, e) =>
{
var xve = new XsdValidationError { Message = e.Message, LineNumber = e.Exception.LineNumber, LinePosition = e.Exception.LinePosition };

ret.Add(xve);
};

using (var sr = new StringReader(xml))
{
var xr = XmlReader.Create(sr, settings);

while (xr.Read());

return ret;
}
}

public class XsdValidationError
{
public String Message { get; set; }
public int LineNumber { get; set; }
public int LinePosition { get; set; }

public override string ToString()
{
return String.Format("Line {0:n0}, Position {1:n0}: {2}", this.LineNumber, this.LinePosition, this.Message);
}
}


Input XML and XSD:

<People>
<Person>Hello Person One
</Person>
<Person>Hello Person Two</Person>
<Person>
Hello Person Three</Person>
</People>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="People">
<xs:complexType>
<xs:sequence>
<xs:element name="Person" maxOccurs="unbounded">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[a-zA-Z ]+"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>


In the XML person one does not fail, person two passes as it should, and person three fails as it should. I need person one to also fail.

I cannot change the input XML or the XSD. Visual Studio correctly validates the file. Any ideas?

EDIT:

I have discovered that if I switch to using an XmlDocument to load the xml, that it then validates correctly, but I lose line number information.

var xd = new XmlDocument();
xd.LoadXml(xml);
var xr = XmlReader.Create(new XmlNodeReader(xd), settings);

Answer

I think this is a quirk/bug with Microsoft's XSD parser. The definition for the xs:string type is :-

 <xsd:simpleType name="string" id="string">
   <xsd:restriction base="xsd:anySimpleType">
   <xsd:whiteSpace value="preserve"/>
   </xsd:restriction>
 </xsd:simpleType>

As the whitespace facet is set to 'preserve' it should contain everything in the element, whitespace and all.

However as you have noticed its ignoring the trailing whitespace. It would seem there is not a lot you can do about this other than apply the validation rule manually in your code.

Incidently it validates as expected in Xerces (erroring for both leading and trailing whitespace).