dgwyer dgwyer - 4 months ago 13
PHP Question

PHP regex remove all digits except character codes

As per this thread it's pretty easy to remove all digits from a string in PHP.

For example:

$no_digits = preg_replace('/\d/', '', 'This string contains digits! 1234');


But, I don't want digits removed that are part of HTML charactr codes such as:

)
©


How can I get Regex to ignore numbers that are part of a HTML character code? i.e. numbers that are sandwiched between
&#
and
;
characters?

Answer

You can use (*SKIP)(*F) verb:

echo preg_replace('/&#\d+;(*SKIP)(*F)|\d+/', '', 
                  'This string contains digits! 1234 ) © 5678');
//=> This string contains digits!  ) ©

&#\d+;(*SKIP)(*F) will skip the match id regex matches &#\d+; pattern.

Alternatively you can use lookarounds:

echo preg_replace('/(?<!&#)\d+|\d+(?!;)/', '',
                  'This string contains digits! 1234 &#41; &#169; 5678');

Which means match 1 or digits that are either not preceded by &# OR not followed by ; thus making it skip &#\d+; pattern.