Evren Yurtesen Evren Yurtesen - 2 months ago 13
PHP Question

PHP PCRE match punctuation but not ++

I tried to search for an answer to this for a while but could not find it. There were many posts related to matching text which is not preceeded by certain text but none seems to work for this case where + is matched but it is allowed only when preceeded by a single + (eg. ++)

I am trying to remove punctuation marks from text but let two consecutive ++ signs to stay but single + signs to disappear

$text="Hello World! C+ C++ C#";
print_r(preg_replace('/(?!\+\+)[[:punct:]]/', ' ', $text));


Results in (I am not sure why the latter + is removed? can somebody explain?):


Hello World C C+ C


If I try:

$text="Hello World! C+ C++ C#";
print_r(preg_replace('/(?!\+)[[:punct:]]/', ' ', $text));


Result is:


Hello World C+ C++ C


But the result I want is:


Hello World C C++ C


Thanks

UPDATE: I realized that I should probably mention that I will have other characters which I want to avoid. I may have oversimplified the question. For example I may want to avoid # also thus result would be


Hello World C C++ C#


the solution should be easily expandable. I am sorry about the inconvenience caused by this missing information.

Jan Jan
Answer

You have a couple of choices here, one being:

(?<!\+)[+#](?!\+)
# with lookarounds making sure no + is after/behind

See a demo on regex101.com.


In PHP:

<?php

$regex = '~(?<!\+)[+#](?!\+)~';

$string = 'Hello World! C+ C++ C#';
$string = preg_replace($regex, '', $string);

echo $string;
?>


Another one would be to use the (*SKIP)(*FAIL) mechanism (which is a bit faster in this example):

\+{2}(*SKIP)(*FAIL)|[+#]
# let two consecutive ++ always fail

See a demo for this one on regex101.com as well.

Last but not least: If you want to add characters/expressions that should be avoided as well, you can put them in a non-capturing group and let this one fail:

(?:\#|\+{2})(*SKIP)(*FAIL)|
[[:punct:]]

Yet another demo on the wonderful regex101.com site.

Comments