MontrealDevOne MontrealDevOne - 3 days ago 5
Java Question

Java Regex : How to match one or more space characters

How do you match more than one space character in Java regex?

I have a regex I am trying to match. The regex fails when I have two or more space characters.

public static void main(String[] args) {
String pattern = "\\b(fruit)\\s+([^a]+\\w+)\\b"; //Match 'fruit' not followed by a word that begins with 'a'
String str = "fruit apple"; //One space character will not be matched
String str_fail = "fruit apple"; //Two space characters will be matched
System.out.println(preg_match(pattern,str)); //False (Thats what I want)
System.out.println(preg_match(pattern,str_fail)); //True (Regex fail)
}

public static boolean preg_match(String pattern,String subject) {
Pattern regex = Pattern.compile(pattern);
Matcher regexMatcher = regex.matcher(subject);
return regexMatcher.find();
}

Answer

The problem is actually because of backtracking. Your regex:

 "\\b(fruit)\\s+([^a]+\\w+)\\b"

Says "fruit, followed by one or more spaces, followed by one or more non 'a' characters, followed by one or more 'word' characters". The reason this fails with two spaces is because \s+ matches the first space, but then gives back the second, which then satisfies the [^a]+ (with the second space) and the \s+ portion (with the first).

I think you can fix it by simply using the posessive quantifier instead, which would be \s++. This tells the \s not to give back the second space character. You can find the documentation on Java's quantifiers here.


As an illustration, here are two examples at Rubular:

  1. Using the possessive quantifier on \s (gives expected results, from what you describe)
  2. Your current regex with separate groupings around [^a\]+ and \w+. Notice that the second match group (representing the [^a]+) is capturing a the second space character.
Comments