“”字符出现而不是“ ”
"Â " character showing up instead of " "
我发现 this thread which describes my issue pretty well and this 答案准确描述了我的问题。
The non-breaking space character is byte 0xA0 is ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as "Â "
. That includes a trailing nbsp...
但是,我已经设法将我的问题追溯到我用来将图像标签包装在 div 中的函数。
function img_format($str)
{
$doc = new DOMDocument();
@$doc->loadHTML($str); // <-- Bonus points for the explaination of the @
// $tags object
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$div->setAttribute('class','inner-copy');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
$tag->setAttribute('class', 'inner-img');
}
$str = $doc->saveHTML();
return $str;
}
很简单,我该如何解决这个函数中的这个问题?
我明白使用;
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
将解决这个问题,但显然我在函数本身中忽略了一些东西。
我试过了;
$dom->validateOnParse = true;
无济于事。 (我不太清楚那是做什么的)
找到了!
@$doc->loadHTML(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
This answer 解释了问题并给出了上述解决方法;
DOMDocument::loadHTML will treat your string as being in ISO-8859-1 unless you tell it otherwise. This results in UTF-8 strings being interpreted incorrectly.
我发现 this thread which describes my issue pretty well and this 答案准确描述了我的问题。
The non-breaking space character is byte 0xA0 is ISO-8859-1; when encoded to UTF-8 it'd be 0xC2,0xA0, which, if you (incorrectly) view it as ISO-8859-1 comes out as
"Â "
. That includes a trailing nbsp...
但是,我已经设法将我的问题追溯到我用来将图像标签包装在 div 中的函数。
function img_format($str)
{
$doc = new DOMDocument();
@$doc->loadHTML($str); // <-- Bonus points for the explaination of the @
// $tags object
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
$div = $doc->createElement('div');
$div->setAttribute('class','inner-copy');
$tag->parentNode->insertBefore($div, $tag);
$div->appendChild($tag);
$tag->setAttribute('class', 'inner-img');
}
$str = $doc->saveHTML();
return $str;
}
很简单,我该如何解决这个函数中的这个问题?
我明白使用;
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
将解决这个问题,但显然我在函数本身中忽略了一些东西。
我试过了;
$dom->validateOnParse = true;
无济于事。 (我不太清楚那是做什么的)
找到了!
@$doc->loadHTML(mb_convert_encoding($str, 'HTML-ENTITIES', 'UTF-8'));
This answer 解释了问题并给出了上述解决方法;
DOMDocument::loadHTML will treat your string as being in ISO-8859-1 unless you tell it otherwise. This results in UTF-8 strings being interpreted incorrectly.