Avión Avión - 18 days ago 5
Java Question

Removing non-alphanumerics but maintain latin characters

From an input string I would like to get rid of the non-alphanumeric characters (

:
,
-
, etc.) but maintain latin characters. Also replace the blank spaces
" "
with
"-"
.

This is my try, but I dont know how to maintain the latin characters.

String title ="NEYÑO: HOW ARE YÓU MATE";
title = title.replaceAll("[^A-Za-z0-9 ]", "").replace(" ", "-").toLowerCase();
System.out.println(title);


Output:

neyo-how-are-yu-mate


Desired output:

neyño-how-are-yóu-mate


Thanks in advance

Answer

Use [^\p{Alnum}\s]+ with the Pattern.UNICODE_CHARACTER_CLASS option to keep all Unicode letters and digts:

String title ="NEYÑO: HOW ARE YÓU MATE";
title = title.replaceAll("(?U)[^\\p{Alnum}\\s]+", "").replace(" ", "-").toLowerCase();
System.out.println(title); // => neyño-how-are-yóu-mate

See the Java demo

Details:

Comments