Nick Sologoub Nick Sologoub - 1 month ago 12
C# Question

XmlTextReader ignores CheckCharacters=false when Normalisation is on

I have implemented my XmlTextReader with overridden setting for CheckCharacters. Something like this:

class MyXmlTextReader : XmlTextReader
{
public MyXmlTextReader(TextReader input) : base(input)
{
}

/// <summary>
/// Settings
/// </summary>
public override XmlReaderSettings Settings
{
get { return new XmlReaderSettings { CheckCharacters = false }; }
}
}


When I use it in normal scenario with invalid xml data everything works fine:

var sr3 = new StringReader(xml);
var xr3 = new MyXmlTextReader(sr3);
var obj3 = (MyObject)ser.Deserialize(xr3);


But as soon as I turn on normalisation, I start getting InvalidCharacter exceptions:

var sr3 = new StringReader(xml);
var xr3 = new MyXmlTextReader(sr3);
xr3.Normalization = true;
var obj3 = (MyObject)ser.Deserialize(xr3);


Is there a way to have normalisation, but at the same time ignore invalid xml characters?

Here is a sample application to reproduce the problem:
https://gist.github.com/ncksol/29bd6490edd0580c25f7338b417b37d3

Answer

This appears to be a shortcoming in the implementation:

  • XmlReader has no Normalization property.
  • XmlReader.Create allows you to pass CheckCharacters as a setting, but since it returns XmlReader, you can't control the normalization through it.
  • XmlTextReader (actually wrapping XmlTextReaderImpl) has Normalization, but no public CheckCharacters property, and no way of accepting XmlReaderSettings.
  • Finally, XmlTextReaderImpl, which does all the real work, can do both normalization and omitted character checking, but due to all of the above, there is no public path to configuring it that way.

If you don't mind relying on the implementation in this case, it can be done through reflection:

var sr3 = new StringReader(xml);
var xr3 = XmlReader.Create(sr3, new XmlReaderSettings { CheckCharacters = false });
// xr3.Normalization is not accessible
xr3.GetType()
    .GetProperty("Normalization", BindingFlags.Instance | BindingFlags.NonPublic)
    .SetValue(xr3, true);

var obj3 = (MyObject)ser.Deserialize(xr3);

Hacky, but still far preferable over implementing XmlTextReader from scratch which, given all the cleverness in the implementation, is not something to undertake lightly.

Note that XmlReader.Create is not contractually obligated to return an instance of a type that has a Normalization property, it just happens to do so in the current implementation.