Rog Rog - 1 month ago 14
PHP Question

PHP Regex is truncating matches

A little help needed.

I am part way there I think.

I have strings like this in a body of text :

"line: this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final one;"

So in the middle of my string I want to include the escaped semi colon but not treat it as the end of the string - the end of the string should be the final semi-colon.

I have this regex pattern :

$regex = "/line:(.*?)[^\\\;];/";


Whilst it matches the pattern with this :

preg_match_all($regex, $texttosearch, $matches)


The contents of $matches[1][0] is truncated, in this example the 'e' is missing...

Array
(
[0] => Array
(
[0] => line: this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final one;
)

[1] => Array
(
[0] => this is something or other with an escaped semi-colon here \; but I want to ignore that up to this final on
)

)


Could anyone help with where I am going wrong please ?

Thank you.

Answer

I think that just using a lookbehind to check if a ; is not preceded with \ is error-prone in case you may have other escape sequences. Use this unrolled regex (as a PHP single quoted string literal):

'~line:([^;\\\\]*(?:\\\\.[^;\\\\]*)*);~'

See the regex demo

Details:

  • line: - literal substring (to match it as a whole word, add \b in front of it)
  • ([^;\\]*(?:\\.[^;\\]*)*) - Group 1 capturing:
    • [^;\\]* - 0+ chars other than ; and \
    • (?:\\.[^;\\]*)* - 0+ sequences of:
      • \\. - any escaped char (add ~s modifier to allow . to match linebreaks, too)
      • [^;\\]*- 0+ chars other than ; and \
  • ; - a semi-colon
Comments