PHP preg_match/replace 在 file_get_contents 之后对字符 "ndash" 不起作用

Question

我通过 file_get_contents($file) 获得了一个字符串。

为什么我不能用 PHP 的 preg_replace 函数替换“–”（不是 "minus" 而是 HTML –）？ preg_match 也不起作用：

例如

$file 的输出是 "blah – blah"。

$str = file_get_contents($file); $str = preg_replace('/–/', 'test', $str); echo $str;

应该returnblah test blah但是return应该blah – blah.

这是什么意思，我该如何替换 ndash？

感谢您的帮助！

Answer 1

该文件似乎包含一个 HTML 实体用于长破折号，为了获得带有 – 的纯文本，您需要先使用 html_entity_decode。

使用

$str = preg_replace('/–/', 'test', html_entity_decode($str));
                                   ^^^^^^^^^^^^^^^^^^^^^^^^

$str = 'blah &ndash; blah';
echo "Original: " . $str . "\n";
$str = preg_replace('/–/', 'test', html_entity_decode($str));
echo "Replaced: " .  $str;

输出：

Original: blah &ndash; blah
Replaced: blah test blah

PHP preg_match/replace doesn't work on character "ndash" after file_get_contents