Elad Benda Elad Benda - 2 years ago 117
Java Question

match with java 8 regex string form any language

I try to match with java 8 regex string form any language

as long as it includes letters, digits and

.
or
-


String s = "בלה בלה";
String pattern= "^[\\p{L}\\p{Digit}_.-]*$";
return s.matches(pattern);


what am i missing as this code returns null for hebrew valid string.

Answer Source

You may add a whitespace to your pattern, and use \w instead of \p{L}\p{Digit}_ while passing the Pattern.UNICODE_CHARACTER_CLASS flag:

String s = "בלה בלה";
String pattern= "(?U)[\\w\\s.-]*";
System.out.println(s.matches(pattern));
// => true

See the Java demo

Since the pattern is used inside String#matches() method, the ^ and $ anchors are not necessary. If you plan to use the pattern with the Pattern#find() method, enclose the pattern within anchors as in the original code ("^(?U)[\\w\\s.-]*$").

Pattern details:

  • (?U) - the Pattern.UNICODE_CHARACTER_CLASS embedded modifier flag that makes shorthand character classes Unicode aware (you may see what \w matches in this mode here)
  • [\\w\\s.-]* - zero or more:
    • \w - word chars (letters, digits, _ and some more)
    • \s - whitespaces
    • . - a dot (no need to escape it inside a character class)
    • - - a hyphen (no need as it is at the end of the character class)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download