Kaii Kaii - 7 months ago 43
PHP Question

regex: match pattern only if not preceded by special character / inside comment

I have input like the following example(s) and need to only replace all

\input{.*}
commands, that are not preceded by a
%
somewhere on the line.

The input is LaTeX code, where
%
is starting a comment. That means that everything after
%
on the current line should not be interpreted as actual code and is actually just a comment. (even if it looks like code)

Example input:

this is \input{REAL.tex} real content % just a \input{COMMENT.tex}
foo \input{REAL.tex} bar
\input{REAL.tex}
%\input{COMMENT.tex}
\input{REAL.tex} % comment


My current code:

$r = "/^(?P<prefix>(?!.*%).*)\\\\input[{\s]+(?P<filename>.*?)[\s}](?P<suffix>.*)$/m";
$data = preg_replace($r, "REPLACED", $data);
echo $data . PHP_EOL;


CURRENT example output:

this is \input{REAL.tex} real content % just a \input{COMMENT.tex}
foo REPLACED bar
REPLACED
%\input{COMMENT.tex}
\input{REAL.tex} % comment


EXPECTED example output:

this is REPLACED real content % just a \input{COMMENT.tex}
foo REPLACED bar
REPLACED
%\input{COMMENT.tex}
REPLACED % comment


Problem: Unfortunately, my regex ignores the
\inputs
in the first and last lines completely due to the
%
comment in the middle, due to the lookahead assertion
(?!.*%)
.

Question: Do you see a way to achieve the desired output via regular expressions? The
\input{REAL.tex}
on the first and last line should be replaced as well.

Answer

I just realized that i don't need to use lookaround here it all!

Code:

$r = "/^(?P<prefix>[^%]*?)\\\\input\\{(?P<filename>[^}]*)\\}(?P<suffix>.*)$/m";
$data = preg_replace($r, "\\1REPLACED\\3", $data);
echo $data . PHP_EOL;

Output:

this is REPLACED real content % just a \input{COMMENT.tex}
foo REPLACED bar
REPLACED
%\input{COMMENT.tex}
REPLACED % comment