PHP preg_match/replace doesn't work on character "ndash" after file_get_contents

I get a String via


Why can't I replace a "–" (not "minus" but HTML
) with PHP's preg_replace function? preg_match also doesn't work:


The output of the
is "blah – blah".

$str = file_get_contents($file);
$str = preg_replace('/–/', 'test', $str);
echo $str;

should return
blah test blah
but returns
blah – blah

Whey is that and how can I replace a ndash instead?

Thanks for your help!

Answer Source

It seems the file contains an HTML entity for the long dash, and in order to get the plain text with you need to use html_entity_decode first.


$str = preg_replace('/–/', 'test', html_entity_decode($str));

PHP demo:

$str = 'blah – blah';
echo "Original: " . $str . "\n";
$str = preg_replace('/–/', 'test', html_entity_decode($str));
echo "Replaced: " .  $str;


Original: blah – blah
Replaced: blah test blah
