Martin Martin - 7 months ago 17
Java Question

Is there a way to get rid of accents and convert a whole string to regular letters?

Is there a better way for getting rid of accents and making those letters regular apart from using

String.replaceAll()
method and replacing letters one by one?
Example:

Input:
orčpžsíáýd


Output:
orcpzsiayd


It doesn't need to include all letters with accents like the Russian alphabet or the Chinese one.

Answer

Use java.text.Normalizer to handle this for you.

string = Normalizer.normalize(string, Normalizer.Form.NFD);

This will separate all of the accent marks from the characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

string = string.replaceAll("[^\\p{ASCII}]", "");

If your text is in unicode, you should use this instead:

string = string.replaceAll("\\p{M}", "");

For unicode, \\P{M} matches the base glyph and \\p{M} (lowercase) matches each accent.

Thanks to GarretWilson for the pointer and regular-expressions.info for the great unicode guide.