Thys Thys - 1 year ago 75
PHP Question

Variable sized lookahead consume

Im a trying to use a regular expression to parse a varying string using php, that string can be, for example;

"twoX // threeY"


or

"twoX /// threeY"


So there is a left-keyword, a
divider
consisting of 2 or 3 slashes and a right-keyword. These are also the parts I would like to consume separately.

"/((?<left>.+)?)(?=(?<divider>[\/]{2,3}))([\/]{2,3})((?<right>.+)?)/";


When I use this regular expression on the first string, everything gets parsed correctly, so;


left: twoX

divider: //

right: threeY


but when I run this expression on the second string, the left and the divider don't get parsed properly. The result I then get is;


left: twoX /

divider: //

right: threeY


I do use the {2,3} in the regular expression to either select 2 or 3 slashes for the divider. But this somehow doesn't seem to work with the match-all character .

Is there a way to get the regex to parse either 2 or 3 slashes without duplicating the entire sequence?

Answer Source

The (.+)? is a greedy dot matching pattern and matches as many chars as possible with 1 being the minimum. So, since the next pattern requires only 2 chars, only 2 chars will be captured into the next group, the first / will belong to Group 1.

Use a lazy pattern in the first group:

'~(?<left>.*?)(?<divider>/{2,3})(?<right>.*)~'
          ^^^

See the regex demo. Add ^ and $ anchors around the pattern to match the whole string if necessary.

Note you do not need to repeat the same pattern in the lookahead and the consuming pattern part, it only makes the pattern cumbersome, (?=(?<divider>[\/]{2,3}))([\/]{2,3}) = (?<divider>[\/]{2,3}).

Details

  • (?<left>.*?) - Group "left" that matches any 0+ chars other than line break chars as few as possible
  • (?<divider>/{2,3}) - 2 or 3 slashes (no need to escape since ~ is used as a regex delimiter)
  • (?<right>.*) - Group "right" matching any 0+ chars other than line break chars as many as possible (up to the end of line).

And a more natrual-looking splitting approach, see a PHP demo:

$s = "twoX // threeY";
print_r(preg_split('~\s*(/{2,3})\s*~', $s, -1, PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY));
// => Array ( [0] => twoX [1] => // [2] => threeY )

You lose the names, but you may add them at a later step.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download