Michael Liu Michael Liu - 3 years ago 104
C# Question

Length of substring matched by culture-sensitive String.IndexOf method

I tried writing a culture-aware string replacement method:

public static string Replace(string text, string oldValue, string newValue)
{
int index = text.IndexOf(oldValue, StringComparison.CurrentCulture);
return index >= 0
? text.Substring(0, index) + newValue + text.Substring(index + oldValue.Length)
: text;
}


However, it chokes on Unicode combining characters:

// \u0301 is Combining Acute Accent
Console.WriteLine(Replace("déf", "é", "o")); // 1. CORRECT: dof
Console.WriteLine(Replace("déf", "e\u0301", "o")); // 2. INCORRECT: do
Console.WriteLine(Replace("de\u0301f", "é", "o")); // 3. INCORRECT: dóf


To fix my code, I need to know that in the second example,
String.IndexOf
matched only one character (
é
) even though it searched for two (
e\u0301
). Similarly, I need to know that in the third example,
String.IndexOf
matched two characters (
e\u0301
) even though it only searched for one (
é
).

How can I determine the actual length of the substring matched by
String.IndexOf
?

NOTE: Performing Unicode normalization on
text
and
oldValue
(as suggested by James Keesey) would accommodate combining characters, but ligatures would still be a problem:

Console.WriteLine(Replace("œf", "œ", "i")); // 4. CORRECT: if
Console.WriteLine(Replace("œf", "oe", "i")); // 5. INCORRECT: i
Console.WriteLine(Replace("oef", "œ", "i")); // 6. INCORRECT: ief

Answer Source

You will need to directly call FindNLSString or FindNLSStringEx yourself. String.IndexOf uses FindNLSStringEx but all the information you need is available in FindNLSString.

Here is an example of how to rewrite your Replace method that works against your test cases. Note that I am using the current user locale read up the API documentation if you want to use the system locale or provide your own. I am also passing in 0 for the flags which means it will use the default string comparison options for the locale, again the documentation can help you provide different options.

public const int LOCALE_USER_DEFAULT = 0x0400;

[DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)]
internal static extern int FindNLSString(int locale, uint flags, [MarshalAs(UnmanagedType.LPWStr)] string sourceString, int sourceCount, [MarshalAs(UnmanagedType.LPWStr)] string findString, int findCount, out int found);

public static string ReplaceWithCombiningCharSupport(string text, string oldValue, string newValue)
{
    int foundLength;
    int index = FindNLSString(LOCALE_USER_DEFAULT, 0, text, text.Length, oldValue, oldValue.Length, out foundLength);
    return index >= 0 ? text.Substring(0, index) + newValue + text.Substring(index + foundLength) : text;
}
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download