We have some unit-tests that are checking UTF-8 byte marking of an XML string before it's loaded into an XmlDocument. Everything works fine using Windows 7 64-bit, but we noticed a bunch of tests failing while trying to run under Windows 10 64-bit.
After a bit of investigation, we found that the XML string on Windows 10 is getting pruned (the preamble exists), while on Windows 7 it does not.
Here is the code snippet:
public static string PruneUtf8ByteMark(string xmlString)
{
var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xmlString.StartsWith(byteOrderMarking))
{
xmlString = xmlString.Remove(0, byteOrderMarking.Length);
}
return xmlString;
}
StartsWith
public static string XmlString = "<?xml version=\"1.0\"....
The problem is cause by culture sensitive comparison.
The byteOrderMarking is not a visible character so it will be trimmed during comparison.
See the following case :
"".StartsWith("") // = true
"aa".StartsWith("") // = true
"aa".StartsWith("", StringComparison.Ordinal) // = true
So every string start with an empty string. Now with byteOrderMarking :
var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
byteOrderMarking.Equals("") // = False
byteOrderMarking.Equals("", StringComparison.CurrentCulture) // = True
byteOrderMarking.Equals("", StringComparison.Ordinal) // = False
Now we can see that byteOrderMarking is equal to an empty string only with Current culture comparison. When you try to check is a string start with byteOrderMarking, it's like to compare to an an empty string.
The difference between Ordinal and CurrentCulture is that the first is a byte to byte comparison, whereas the second will by normalize according to the culture.
Lastly, I suggest to always use Ordinal (or OrdinalIgnoreCase) to compare technical strings.