d.moncada d.moncada - 4 years ago 84
C# Question

UTF-8 Byte Mark check gives different value based on operating system

We have some unit-tests that are checking UTF-8 byte marking of an XML string before it's loaded into an XmlDocument. Everything works fine using Windows 7 64-bit, but we noticed a bunch of tests failing while trying to run under Windows 10 64-bit.

After a bit of investigation, we found that the XML string on Windows 10 is getting pruned (the preamble exists), while on Windows 7 it does not.

Here is the code snippet:

public static string PruneUtf8ByteMark(string xmlString)
var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
if (xmlString.StartsWith(byteOrderMarking))
xmlString = xmlString.Remove(0, byteOrderMarking.Length);

return xmlString;

is returning true for Windows 10, and false for Windows 7. Note that the same XML string is being used, the only difference here is the OS.

Any ideas? We are a bit lost here, since both PCs are x64 running the same .NET version.

The string comes from a class via:

public static string XmlString = "<?xml version=\"1.0\"....

On Windows 10, the less than sign gets truncated because the byte mark check is true.

Answer Source

The problem is cause by culture sensitive comparison.

The byteOrderMarking is not a visible character so it will be trimmed during comparison.

See the following case :

"".StartsWith("") // = true
"aa".StartsWith("") // = true 
"aa".StartsWith("", StringComparison.Ordinal) // = true

So every string start with an empty string. Now with byteOrderMarking :

var byteOrderMarking = Encoding.UTF8.GetString(Encoding.UTF8.GetPreamble());
byteOrderMarking.Equals("") // = False
byteOrderMarking.Equals("", StringComparison.CurrentCulture) // = True
byteOrderMarking.Equals("", StringComparison.Ordinal) // = False

Now we can see that byteOrderMarking is equal to an empty string only with Current culture comparison. When you try to check is a string start with byteOrderMarking, it's like to compare to an an empty string.

The difference between Ordinal and CurrentCulture is that the first is a byte to byte comparison, whereas the second will by normalize according to the culture.

Lastly, I suggest to always use Ordinal (or OrdinalIgnoreCase) to compare technical strings.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download