Socrates Socrates - 5 months ago 55
HTML Question

XML Serialization Deserialization HTML Entities C# .net

We have some XML files which we get as input (whose format is not under our control).

<?xml version="1.0" encoding="UTF-8"?>
<GroupFile..>
<Group id="10" desc="Description">
<Member id="117">&#x00B0;</Member>
</Group>
</GroupFile>


This file can contain HTML entity code representation of symbols like "°" (represented as "
&#x00B0;
" in hex). This file is deserialized to Group and Member class objects. When XML deserializing the Member element value is correctly read as "°" and displayed in a grid. When serializing back the earlier objects back into XML, the Member value is saved as "°" instead of "
&#x00B0;
".

Deserialization - Correct

<Member id="117">&#x00B0;</Member>
deserializes into Member object with value °

Serialization - Issue here

The same Member object with value ° serializes into
<Member id="117">°</Member>
instead of
<Member id="117">&#x00B0;</Member>


How can this be prevented and get it serialized back as "
&#x00B0;
" ?

Fab Fab
Answer

You must then apply a custom serialization/deserialization to do so.

Using HttpUtility.HtmlEncode/HtmlDecode is not sufficient since it provide the decimal encoding. I added the following (could be improved in terms of error catching) to keep the hex escaped characters in the xml serialization.

Update: In order to avoid automatic escape of special character, you must write a custom Xml serializer for the class as seen below and use WriteRaw

If you use the XmlSerializer:

public class GroupFile
{
    [XmlElement("Group")]
    public Group[] Groups { get; set; }
}

public class Group
{
    [XmlAttribute("id")]
    public int Id { get; set; }

    [XmlElement("Member")]
    public Member[] Members { get; set; }
}

[Serializable]
public class Member : IXmlSerializable
{

    public static string DecimalToHexadecimalEncoding(string html)
    {
        var splitted = html.Split('#');
        var res = Int32.Parse(splitted[1].Replace(";", string.Empty));
        return "&#x" + res.ToString("x4") + ";";
    }

    [XmlAttribute("id")]
    public int Id { get; set; }       

    [XmlIgnore]
    public string Value { get; set; }

    [XmlText]
    public string HexValue
    {
        get
        {
            // convert to hex representation
            var res = HttpUtility.HtmlEncode(Value);
            res = DecimalToHexadecimalEncoding(res);
            return res;
        }
    }

    public XmlSchema GetSchema()
    {
        return null;
    }

    public void ReadXml(XmlReader reader)
    {
        var attributeValue = reader.GetAttribute("id");
        if (attributeValue != null)
        {
            Id = Int32.Parse(attributeValue);
        }
        // Here the value is directly converted to string "°"
        Value = reader.ReadElementString();            
        reader.ReadEndElement();           
    }

    public void WriteXml(XmlWriter writer)
    {
        writer.WriteAttributeString("id", Id.ToString());
        writer.WriteRaw(HexValue);
    }
}