DaveJohnston DaveJohnston - 7 months ago 14
Java Question

Regular Expression: match everything up to an optional capture group

I have the following regular expression:

(.*)(?:([\+\-\*\/])(-?\d+(?:\.\d+)?))


the intention is to capture mathematical expressions in the form (left expression) (operator) (right operand), e.g.
1+2+3
would be captured as
(1+2)(+)(3)
. It will also handle single operands, e.g.
1+2
would be captured as (1)(+)(2).

The problem I am having is that this regular expression won't match on a single operand with no operator, e.g. 5 should be matched in the first capture group with nothing in the second and third (5)()(). If I make the last part optional:

(.*)(?:([\+\-\*\/])(-?\d+(?:\.\d+)?))?


then the initial group will always capture the entire expression. Is there any way I can make the second part optional but have it take precedence over the greedy matching done by the first group?

Answer

Description

This Regex will:

  • captures the math expression upto the last operation
  • captures the last operation
  • captures the last number in the math expression
  • assumes that each number might have a plus or minus sign showing that the number is positive or negative
  • assumes each number might be non-integer
  • assumes the math expression can contain any number of operations such as: 1+2 or 1+2+3 or 1+2+3+4 or 1+2+3+4...
  • validates the string is a math expression. There are some edge cases which are not accounted for here such as the use of parenthesis, or other complex math symbols.

Raw Regular Expression

Note this being Java, you'll need to escape the back slashes in this regex. To escape them simply replace all the \ with a \\.

^([-+]?\d+(?:[.]\d+)?(?:[-+*/^][-+]?\d+(?:[.]\d+)?)*)(?:([-+*/^])((?:(?<=[-+/*^])[-+]?)\d+(?:[.]\d+)?))?$

Regular expression visualization

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of a "line"
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [-+]?                    any character of: '-', '+' (optional
                             (matching the most amount possible))
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
----------------------------------------------------------------------
      [.]                      any character of: '.'
----------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
    )?                       end of grouping
----------------------------------------------------------------------
    (?:                      group, but do not capture (0 or more
                             times (matching the most amount
                             possible)):
----------------------------------------------------------------------
      [-+*/^]                  any character of: '-', '+', '*', '/',
                               '^'
----------------------------------------------------------------------
      [-+]?                    any character of: '-', '+' (optional
                               (matching the most amount possible))
----------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times
                                 (matching the most amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
    )*                       end of grouping
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
----------------------------------------------------------------------
    (                        group and capture to \2:
----------------------------------------------------------------------
      [-+*/^]                  any character of: '-', '+', '*', '/',
                               '^'
----------------------------------------------------------------------
    )                        end of \2
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      (?:                      group, but do not capture:
----------------------------------------------------------------------
        (?<=                     look behind to see if there is:
----------------------------------------------------------------------
          [-+/*^]                  any character of: '-', '+', '/',
                                   '*', '^'
----------------------------------------------------------------------
        )                        end of look-behind
----------------------------------------------------------------------
        [-+]?                    any character of: '-', '+' (optional
                                 (matching the most amount possible))
----------------------------------------------------------------------
      )                        end of grouping
----------------------------------------------------------------------
      \d+                      digits (0-9) (1 or more times
                               (matching the most amount possible))
----------------------------------------------------------------------
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
        \d+                      digits (0-9) (1 or more times
                                 (matching the most amount possible))
----------------------------------------------------------------------
      )?                       end of grouping
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
  )?                       end of grouping
----------------------------------------------------------------------
  $                        before an optional \n, and the end of a
                           "line"
----------------------------------------------------------------------

Examples

Sample Text

1+2+-3

Sample Capture Groups

[0] = 1+2+-3
[1] = 1+2
[2] = +
[3] = -3

Online demo: http://fiddle.re/b2w5wa

Sample Text

-3

Sample Capture Groups

[0] = -3
[1] = -3
[2] = 
[3] = 

Online demo: http://fiddle.re/a77zra

Sample Java Code

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("^([-+]?\\d+(?:[.]\\d+)?(?:[-+*/^][-+]?\\d+(?:[.]\\d+)?)*)(?:([-+*/^])((?:(?<=[-+/*^])[-+]?)\\d+(?:[.]\\d+)?))?$

",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}
Comments