Zaphod Beeblebrox Zaphod Beeblebrox - 2 months ago 10
Java Question

Regular expression to match optional end of string

Given the following:

"John Smith"
"John Smith (123)"
"John Smith (123) (456)"


I'd like to capture:

"John Smith"
"John Smith", "123"
"John Smith (123)", "456"


What Java regex would allow me to do that?

I've tried
(.+)\s\((\d+)\)$
and it works fine for "John Smith (123)" and "John Smith (123) (456)" but not for "John Smith". How can I change the regex to work for the first input as well?

Answer

You may turn the first .+ lazy, and wrap the later part with a non-capturing optional group:

(.+?)(?:\s\((\d+)\))?$
   ^ ^^^           ^^ 

See the regex demo

Actually, if you are using the regex with String#matches() the last $ is redundant.

Details:

  • (.+?) - Group 1 capturing one or zero characters other than a linebreak symbol, as few as possible (thus, allowing the subsequent subpattern to "fall" into a group)
  • (?:\s\((\d+)\))? - an optional sequence of a whitespace, (, Group 2 capturing 1+ digits and a )
  • $ - end of string anchor.

A Java demo:

String[] lst = new String[] {"John Smith","John Smith (123)","John Smith (123) (456)"};
Pattern p = Pattern.compile("(.+?)(?:\\s\\((\\d+)\\))?");
for (String s: lst) {
    Matcher m = p.matcher(s);
    if (m.matches()) {
        System.out.println(m.group(1));
        if (m.group(2) != null)
            System.out.println(m.group(2));
    }
}