Dmitry Smolyaninov Dmitry Smolyaninov - 1 month ago 7x
Java Question

Regex to replace All turkish symbols to regular latin symbols

I have a class that replaces all turkish symbols to similar latin symbols and pass the result to searcher.

these are the methods for symbol replacement

String replaceTurkish(String words) {

if (checkWithRegExp(words)) {
return words.toLowerCase().replaceAll("ç", "c").replaceAll("ğ", "g").replaceAll("ı", "i").
replaceAll("ö", "o").replaceAll("ş", "s").replaceAll("ü", "u");
} else return words;

public static boolean checkWithRegExp(String word){
Pattern p = Pattern.compile("[öçğışü]");
Matcher m = p.matcher(word);
return m.matches();

But this always return unmodified words statement.

What am I doing wrong?

Thanks in advance!


Per the Java 7 api, Matcher.matches()

Attempts to match the entire region against the pattern.

Your pattern is "[öçğışü]", which (an awesome resource) says will match

a single character in the list öçğışü literally

Perhaps you may see the problem already. Your regex is not going to match anything except a single Turkish character, since you are attempting to match the entire region against a regex which will only ever accept one character.

I recommend either using find(), per suggestion by Andreas in the comments, or using a regex like this:


which should actually find words which contains any Turkish-specific characters.

Additionally, I'll point out that regex is case-sensitive, so if there are upper-case variants of these letters, you should include those as well and modify your replace statements.

Finally (edit): you can make your Pattern case-insensitive, but your replaceAll's will still need to change to be case-insensitive. I am unsure of how this will work with non-Latin characters, so you should test that flag before relying on it.

Pattern p = Pattern.compile(".*[öçğışü].*", Pattern.CASE_INSENSITIVE);