Removing spaces after anchor tag with preg replace

I want to put a space after anchor tag so that the next word becomes separate from it. The problem is there are anchor tags after which there is

characters or there could be another html tag opening. So in those cases we do not want to put a
as it will break our records.

I only want to put space after anchor if there is no space and there is a word.

Right now i have come up with regex which i am not sure is exactly what i want

preg_replace("/\<\/a\>([^\s<&nbsp;])/", '</a> $1', $text, -1, $count);
print "Number of occurence in type $type = $count \n";
$this->count += $count;

I tried to see the number of occurence before i actually save the replaced string. But it is showing way higher amount which i highly doubt cannot be.

Please help me fixing this regex.


<a href="blah.com">Hello</a>World // Here we need to put space between Hello and World

<a href="blah.com">Hello</a>&nbsp;World // Do not touch this

<a href="blah.com">Hello</a><b>World</b> // do not touch this

There could be so many cases that has to be ignore but specifically speaking we need the first scenario to be executed

Answer Source

As @trincot pointed out [^\s<&nbsp;] doesn't mean if it is not a space or non-breaking space. It's a character class and whatever is between those brackets has a mean of a single character only. So it means if it is not a space or < or & or...

You need to check if very next character is a word character \w which denotes [a-zA-Z0-9_], then consider to add an space at zero-width assertion of used positive lookahead:

 preg_replace("~</a>\K(?=\w)~", ' ', $text, -1, $count);
 echo "Number of occurrences in type $type is $count \n";

What does this RegEx mean?

</a>    # Match closing anchor tag
\K      # Reset match
(?=\w)  # Look if next character is a word character

Update: Another solution to cover all HTML-problematic cases:

preg_replace("~</a>\K(?!&nbsp;)~", '&nbsp;', $text, -1, $count);

This adds a non-breaking space when there is no non-breaking space after closing anchor tag.