mlm mlm - 4 months ago 6
Java Question

Java regex: can I match groups using only one regex?

I have a string that follows these rules:


  1. Add a capital letter, unique to the entire string.

  2. Then add one or more groups of the pattern \d+z where \d is a digit i.e. a digit one or more times followed by a 'z'.

  3. Repeat 1 and 2 above, zero or more times.



An example string that follows the above rules is:

"A42z19z037z21z" +
"B942z21z4842z" +
"C33z449z3884z68z20z"


(This is one string broken down for appearance.)

I need a regex that effectively does the following:


  1. Go to a specified capital letter e.g. 'B'.

  2. Match each group of \d+z (see rule 2 above) between this capital letter and the next capital letter.



This seems to need two separate regexes, one to find the location of 'B', then one to match groups until the next capital letter. Can this be done in one regex?

EDIT:

So, using the above example, the matches would be "942z", "21z", and "4842z".

Answer

I suggest using a regex with a \G-based boundary:

([A-Z]|(?!\A)\G)(\d+z)

See the regex demo

Pattern details:

  • ([A-Z]|(?!\A)\G) - Group 1 capturing either an uppercase ASCII letter or the end of the previous successful match
  • (\d+z) - Group 2 capturing 1+ digits and a z.

Here is a Java demo:

String value1 = "A42z19z037z21zB942z21z4842zC33z449z3884z68z20z";
String pattern1 = "([A-Z]|(?!\\A)\\G)(\\d+z)";
Pattern ptrn = Pattern.compile(pattern1);
Matcher matcher = ptrn.matcher(value1);
ArrayList<ArrayList<String>> result_lst = new ArrayList<ArrayList<String>>();
ArrayList<String> lst = null;
while (matcher.find()) {
    if (!matcher.group(1).equals("")) {
        if (lst != null) result_lst.add(lst);
        lst = new ArrayList<String>();
        lst.add(matcher.group(1));
    }
    else {
        lst.add(matcher.group(2));
    }
}
if (lst != null) result_lst.add(lst);
System.out.println(result_lst);

Output: [[A, 19z, 037z, 21z], [B, 21z, 4842z], [C, 449z, 3884z, 68z, 20z]]