Trishen Trishen - 1 month ago 8
C# Question

Replacing characters in C# (ascii)

I got a file with characters like these: à, è, ì, ò, ù - À. What i need to do is replace those characters with normal characters eg: à = a, è = e and so on..... This is my code so far:

StreamWriter sw = new StreamWriter(@"C:/JoinerOutput.csv");
string path = @"C:/Joiner.csv";
string line = File.ReadAllText(path);

if (line.Contains("à"))
{
string asAscii = Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, new EncoderReplacementFallback("a"), new DecoderExceptionFallback()), Encoding.UTF8.GetBytes(line)));
Console.WriteLine(asAscii);
Console.ReadLine();

sw.WriteLine(asAscii);
sw.Flush();
}


Basically this searches the file for a specific character and replaces it with another. The problem that i am having is that my if statement doesn't work. How do i go about solving this?

This is a sample of the input file:


Dimàkàtso Mokgàlo
Màmà Ràtlàdi
Koos Nèl
Pàsèkà Modisè
Jèrèmiàh Morèmi
Khèthiwè Buthèlèzi
Tiànà Pillày
Viviàn Màswàngànyè
Thirèshàn Rèddy
Wàdè Cornèlius
ènos Nètshimbupfè


This is the output if use : line = line.Replace('à', 'a'); :


Ch�rl�n� Kirst�n
M�m� R�tl�di
Koos N�l
P�s�k� Modis�
J�r�mi�h Mor�mi
Kh�thiw� Buth�l�zi
Ti�n� Pill�y
Vivi�n M�sw�ng�ny�
Thir�sh�n R�ddy
W�d� Corn�lius
�nos N�tshimbupf�


With my code the symbol will be removed completely

Answer

Don't know if it is useful but in an internal tool to write message on a led screen we have the following replacements (i'm sure that there are more intelligent ways to make this work for the unicode tables, but this one is enough for this small internal tool) :

        strMessage = Regex.Replace(strMessage, "[éèëêð]", "e");
        strMessage = Regex.Replace(strMessage, "[ÉÈËÊ]", "E");
        strMessage = Regex.Replace(strMessage, "[àâä]", "a");
        strMessage = Regex.Replace(strMessage, "[ÀÁÂÃÄÅ]", "A");
        strMessage = Regex.Replace(strMessage, "[àáâãäå]", "a");
        strMessage = Regex.Replace(strMessage, "[ÙÚÛÜ]", "U");
        strMessage = Regex.Replace(strMessage, "[ùúûüµ]", "u");
        strMessage = Regex.Replace(strMessage, "[òóôõöø]", "o");
        strMessage = Regex.Replace(strMessage, "[ÒÓÔÕÖØ]", "O");
        strMessage = Regex.Replace(strMessage, "[ìíîï]", "i");
        strMessage = Regex.Replace(strMessage, "[ÌÍÎÏ]", "I");
        strMessage = Regex.Replace(strMessage, "[š]", "s");
        strMessage = Regex.Replace(strMessage, "[Š]", "S");
        strMessage = Regex.Replace(strMessage, "[ñ]", "n");
        strMessage = Regex.Replace(strMessage, "[Ñ]", "N");
        strMessage = Regex.Replace(strMessage, "[ç]", "c");
        strMessage = Regex.Replace(strMessage, "[Ç]", "C");
        strMessage = Regex.Replace(strMessage, "[ÿ]", "y");
        strMessage = Regex.Replace(strMessage, "[Ÿ]", "Y");
        strMessage = Regex.Replace(strMessage, "[ž]", "z");
        strMessage = Regex.Replace(strMessage, "[Ž]", "Z");
        strMessage = Regex.Replace(strMessage, "[Ð]", "D");
        strMessage = Regex.Replace(strMessage, "[œ]", "oe");
        strMessage = Regex.Replace(strMessage, "[Œ]", "Oe");
        strMessage = Regex.Replace(strMessage, "[«»\u201C\u201D\u201E\u201F\u2033\u2036]", "\"");
        strMessage = Regex.Replace(strMessage, "[\u2026]", "...");

One thing to note is that if in most language the text is still understandable after such a treatment it's not always the case and will often force the reader to refer to the context of the sentence to be able to understand it. Not something you want if you have the choice.


Note that the correct solution would be to use the unicode tables, replacing characters with integrated diacritics with their "combined diacritical mark(s)"+character form and then removing the diacritics...

Comments