A.D. A.D. - 12 days ago 5
C# Question

RegexOptions.CultureInvariant not finding matches for accents

I would like to create a regex that ignores accent.

For instance:

string s = "I am an old élephant";
string pattern = "elephant";
bool result = new Regex(pattern, RegexOptions.CultureInvariant).IsMatch(s);


My culture when I test is:

System.Globalization.CultureInfo.CurrentCulture = Fr-fr


So I would have expected this code to find a match but it does not.

Is there an easy way to get a match for this?

I am trying to make a StringReplace overload method that would replace élèphânt with elephant and so on.

Answer

Use following method:

    public string removeDiacritics(string str)
    {
        var sb = new StringBuilder();

        foreach (char c in str.Normalize(NormalizationForm.FormD))
        {
            if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            {
                sb.Append(c);
            }
        }
        return sb.ToString().Normalize(NormalizationForm.FormC);
    }

Then it works

        string s = "I am an old élephant";
        string pattern = "elephant";
        bool result = new Regex(pattern, RegexOptions.IgnoreCase).IsMatch(removeDiacritics(s)); //true

If you have to replace something e.g. iterate (backward) through the matchcollection and edit you original string depending on the indexes of each match.