JensD JensD - 1 year ago 61
Java Question

Regex understand \b

I am struggling to understand word boundary \b in regex.
I read that there are three conditions for \b.

  • Before the first character in the string, if the first character is a
    word character.

  • After the last character in the string, if the last character is a
    word character.

  • Between two characters in the string, where one is a word character
    and the other is not a word character.

I am trying to find the start index of the previous match using the java method start()

import java.util.regex.*;
class Quetico{
public static void main(String[] args){
Pattern p = Pattern.compile(args[0]);
Matcher m = p.matcher(args[[1]]);
System.out.print("match positions: ");
System.out.print(m.start()+" ");

% java Quetico "\b" "^23 *$76 bc"

//string: ^23 *$76 bc pattern:\b
//index : 01234567890

produces: 1 3 5 6 7 9

I'm having trouble understanding why is produces this result. Because I'm struggling to see the pattern. Ive tried looking at the inverse, \B which produces 0 2 4 8 however this doesn't make it any clearer for me. If you can help clarify this for me it would be appreciated.

ajb ajb

The issue isn't Java here, it's Linux/Unix. When you put text between double quote marks on the command line, most of the special shell characters such as *, ?, etc. are no longer special--except for variable interpolation. (And some other things, like ! depending on which shell flavor you're using.) Thus, if you say

% command "this $variable is interesting"

if you've set variable to value, your command will be called with one argument, this value is interesting. In your case, Linux will treat $7 as a shell script parameter, even though you're not in a shell script; since this isn't set to anything, it's replaced with an empty string, and the result is the same as if you had run

% java Quetico "\b" "^23 *6 bc"

which gives me 1 3 5 6 7 9 if I use that string literal in a Java program (instead of on the command line).

To prevent $ from being interpreted by the shell, you need to use single quote marks:

% java Quetico "\b" '^23 *$76 bc'