Lech Dutkiewicz Lech Dutkiewicz -4 years ago 80
PHP Question

PHP regex replace white space by   if it is following single letter, but without breaking html tags

I've found here

preg_replace('/(?<=\b[a-z]) /i', '&nbsp;', $s);


to handle the first part of what I need. It transforms

"hello, this is a beautiful day"


into

"hello, this is a&nbsp;beautiful day".


Unfortunately it also breaks some html tags if they're present in content.

"hello, this is a <a href="example.com">beautiful day</a>"


ends up as

"hello, this is a&nbsp;<a&nbsp;href="example.com">beautiful day</a>"


How can I regex this sentence into

"hello, this is a&nbsp;<a href="example.com">beautiful day</a>"


I also have to handle some latin-extended characters, so example text to fix is

Dziedziczenie dlugów spadkowych jest wciąż bardzo żywym tematem, pomimo korzystnej dla spadkobierców zmiany przepisów w 2015 roku, o której szerzej pisałem na blogu <a href="http://www.prawnik-katowice.pl/blog-prawniczy/dziedziczenie-dlugow-od-18-pazdziernika-2015-roku/">tutaj</a>.

Answer Source

RegEx:

(?i)<\/?\w+[^>]*>(*SKIP)(?!)|\b(\p{Latin})\s

Live demo

PHP code:

preg_replace('~</?\w+[^>]*>(*SKIP)(?!)|\b(\p{Latin})\s~iu', '\\1&nbsp;', $str);

Note: watch the u modifier.

Breakdown

 (?i)               # Set case-insensitive flag
 <\/? \w+ [^>]* >   # Match opening / closing HTML tags 
 (*SKIP)(?!)        # Throw them away
 |                  # Or
 \b                 # Match a word-boundary position
 ( \p{Latin} )      # Capture a letter
 \s                 # Match a whitespace
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download