peter.murray.rust peter.murray.rust - 3 months ago 7
Java Question

What is a word boundary in regexes?

I am using Java regexes in Java 1.6 (inter alia to parse numeric output) and cannot find a precise definition of

\b
("word boundary"). I had assumed that
-12
would be an "integer word" (matched by
\b\-?\d+\b
) but it appears that this does not work. I'd be grateful to know of ways of matching space-separated numbers.

Example:

Pattern pattern = Pattern.compile("\\s*\\b\\-?\\d+\\s*");
String plus = " 12 ";
System.out.println(""+pattern.matcher(plus).matches());
String minus = " -12 ";
System.out.println(""+pattern.matcher(minus).matches());
pattern = Pattern.compile("\\s*\\-?\\d+\\s*");
System.out.println(""+pattern.matcher(minus).matches());


This returns:

true
false
true

Answer

A word boundary, in most regex dialects, is a position between \w and \W (non-word char), or at the beginning or end of a string if it begins or ends (respectively) with a word character ([0-9A-Za-z_]).

So, in the string "-12", it would match before the 1 or after the 2. The dash is not a word character.

Comments