Alex Roseland Alex Roseland - 1 month ago 5
PHP Question

PHP regex catch the first pattern and everything after it even if it repeats.

I'm working on speech recognition for a chat bot and I need it to simply catch everything after a pattern (including the pattern) and put it into one of the output arrays. I assumed this would be easy but I can't get it to work. The initial dividing pattern may repeat and if it does later in the string it seems to use that as the dividing point instead instead of the first occurrence. There is probably a simple way of doing that which I am over looking.

$input_line = "aaaa delimit bbbb delimit cccc delimit dddd delimit eeee";

preg_match("/(.+) (delimit) (.+)/", $input_line, $output_array);


I want one of the output matches to be

=> delimit bbbb delimit cccc delimit dddd delimit eeee


but the out put array I'm getting is

array(4
0=>aaaa delimit bbbb delimit cccc delimit dddd delimit eeee
1=>aaaa delimit bbbb delimit cccc delimit dddd
2=>delimit
3=>eeee)


So I just want to catch the 1st delimit and everything after it even if there are other delimits. I have tried:

(.+) ((delimit) (.+)){1}


Along with other variations using *,?,{} but can't seem to get it. For this example the groups of 4 letters (ie, aaaa) can represent any string of words that the user might input along with the delimiting word.

Answer

You get so many elements in the array because you used too many capturing groups in the pattern. Since the regex egnine parses the string from left to right, you may define your pattern as /pattern.*/s - it will find the first pattern and then will match any 0+ chars (even including linebreaks since /s modifier enables a DOTALL mode when a dot matches any char).

To match anything before the first delimit, and then what is after it, use (.*?) before the delimit, so that the lazy *? would match any 0+ chars up to the first occurrence of delimit:

preg_match("/(.*?)(delimit.*)/s", $input, $match)

See the regex demo

Sample code:

$input_line = "aaaa delimit bbbb delimit cccc delimit dddd delimit eeee";
if (preg_match("/(.*?)(delimit.*)/s", $input_line, $m)) {
  echo $m[1] . "\n";
  echo $m[2];
}

Output:

aaaa
delimit bbbb delimit cccc delimit dddd delimit eeee

See the online PHP demo