如何将欧元 (€) 符号从 Windows-1252 转换为 UTF-8?

How to convert euro (€) symbol from Windows-1252 to UTF-8?

一个软件为我生成了一个 Windows-1252 XML 文件,我想在 PHP 中解析它,并以 UTF8 格式将数据发送到我的数据库中。

我尝试了很多解决方案,例如iconvutf8_encode函数,但没有结果。

它显示类似 € 的内容,但不只是 ...

我的 XML 文件是这样的:

<?xml version="1.0" encodoing="Windows-1252" standalone="yes"?>
    <node>The price is 12 &#128; !</node>

&#128;好像是Windows-1252中€(欧元)的代码。

我尝试了这些功能:

<!doctype html>
<html lang='fr'>
    <head>
        <meta charset='UTF-8'>
    </head>

    <body>

<?php
    // XML Loading in DOM Document
    // Parsing XML Node

    /* Not working */
    $node = iconv('Windows-1252', 'UTF-8', $nodeValue);

    /* Not working */
    $node = utf8_encode($nodeValue);
?>

    </body>
</html>

如图所示Stack Overflow question the Euro symbol is converted to the latin-1 supplement euro character, and not the "proper" UTF-8 codepoint。解决方法是 utf8_decode,然后再次 "re-encode": $node = iconv('Windows-1252', 'UTF-8', utf8_decode($node));

一些有效的示例代码:

<?php
$xml = '<?xml version="1.0" encoding="Windows-1252" standalone="yes"?>
    <node>The price is 12 &#128; !</node>';

$doc = new DomDocument();
$doc->loadXML($xml);
$nodes = $doc->getElementsByTagName('node');
$node = iconv('Windows-1252', 'UTF-8', utf8_decode($nodes[0]->nodeValue));
echo $node;