在转义斜杠时将 unicode 特殊字符转换为 html 的最佳方法？

Question

所以我发现自己需要转换一些来自数据库的 html 文本，我得到类似于这些的字符串：

<p style=\"font-size: 10px;\">\n<strong>Search for:<\/strong> <span style=\"color:#888888;\">2 to 15 People, \u00b120$ Per Person, Informal, Available on Date<\/span>\n<\/p>

我需要把它放在适当的位置 HTML。像这样：

<p style="font-size: 10px;">
<strong>Search for:</strong> <span style="color:#888888;">2 to 15 People, &plusmn;20$ Per Person, Informal, Available on Date</span>
</p>

这里有几个问题，首先是斜杠，我在 stripslashes 之前使用 stripcslashes 所以它首先转换 C 风格的转义符，如“\n”。然后我使用 stripslashes 删除引号转义。但这会弄乱 unicode 字符，例如 ± 符号 (\u00b1)

我在网上搜索过，似乎使用 json 解码是通常用于此的技巧，但由于我正在处理的字符串类型，我不能在这里使用 json 解码和。这只是一个例子，我正在使用的真实字符串是完整的 HTML 页。

有人能告诉我如何解决这个问题吗？

这是我目前使用的：现在我正在使用这个：

$final = urlencode(stripslashes(stripcslashes(html_entity_decode($html, ENT_COMPAT, 'UTF-8'))));

它让我得到了一个近乎完美的 HTML 页面，除了像 \u00b1

这样的 unicode 字符

解决方案

我最终使用了 Lawrence Cherone 给出的解决方案

$new_html = str_replace(array('\"', '\/', '&quot;', '\n'), array('"', '/', '\'', "\n"), $old_html);

function unicode_convert($match){
   return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');                         }                          

$new_html = preg_replace_callback('/\\u([0-9a-fA-F]{4})/', "unicode_convert", $new_html);

Answer 1

如果我对你的理解正确，你只是想换掉：\" 用于 " 和 \/ 用于 /，也许还有其他。

您可以使用 str_replace() 并在要切换的事物列表中定位超过 \ 个。

编辑，使用来自以下答案的 preg_replace_callback 代码修复 unicode：How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

<?php
$str = '<p style=\"font-family: &quot;Open Sans&quot;, sans-serif; font-size: 10px; color: rgb(60, 170, 80); line-height: 150%; text-align: right; padding-top: 0px; padding-bottom: 0px; margin: 0px; overflow: hidden;\"><strong>Search for:<\/strong> <span style=\"color:#888888;\">2 to 15 People, \u00b120$ Per Person, Informal, Available on Date<\/span>\n<\/p>';

echo preg_replace_callback('/\\u([0-9a-fA-F]{4})/', function ($match) {
    return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
}, str_replace(['\"', '\/', '&quot;', '\n'], ['"', '/', '\'', "\n"], $str));

结果：

<p style="font-family: 'Open Sans', sans-serif; font-size: 10px; color: rgb(60, 170, 80); line-height: 150%; text-align: right; padding-top: 0px; padding-bottom: 0px; margin: 0px; overflow: hidden;"><strong>Search for:</strong> <span style="color:#888888;">2 to 15 People, ±20$ Per Person, Informal, Available on Date</span>
</p>

https://3v4l.org/fTtLh

在转义斜杠时将 unicode 特殊字符转换为 html 的最佳方法？

Best way to convert unicode special characters to html while escaping slashes?

html

php

unicode

unicode-escapes