PHP - 如何处理 'utf-16'、us-ascii 编码的 html 字符串以正确保存在 DomDocument 中?

PHP - How to handle 'utf-16', us-ascii encoded html string to save correctly in DomDocument?

我正在开发一个 PHP 项目,该项目获取电子邮件并将其显示在屏幕上。在一封电子邮件中,它获取以下 html :

    <html>
    <head>

    <META http-equiv="Content-Type" content="text/html; charset=utf-16">

    <style type="text/css">
          TD {
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          }
          BODY {
          font-family: Verdana,Tahoma,Arial, "Sans Serif";
          font-size: 10pt;
          }
        </style>



    </head>

      <body bgcolor="#eeeeee"><img width="1" height="1" alt="" src="https://trademe.tmcdn.co.nz/images/1pixel.gif?gen=20181128"><table cellspacing="0" cellpadding="0" width="700" bgcolor="white" align="center" style="border-left: 1px #CCCCCC solid; border-right: 1px #CCCCCC solid; border-top: 1px #CCCCCC solid;">
      <tr>

        <td height="20" colspan="4">&nbsp;</td>

      </tr>

      <tr>

        <td width="20"></td>

        <td><a href="https://www.trademe.co.nz/Track.aspx?site=2018112820201&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;"><img border="0" alt="Trade Me Logo" width="246" height="48" src="https://trademe.tmcdn.co.nz/images/new-brand-2016/common/tm-logo-2016-246x48-v1.gif?gen=2018112820201"></a><img src="https://api.trademe.co.nz/tracking/collect?evt=open&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937&amp;tid=EB71C99D-BEB4-445F-B62B-C172AC5A4CF4"></td>

        <td align="center"></td>

        <td width="20"></td>

      </tr>

      <tr>

        <td width="20"></td>

        <td colspan="2">

          <hr size="0" color="#CCCCCC">

          <center><small>Security Note: Trade Me will never ask you for your password via email</small></center>

          <hr size="0" color="#CCCCCC">

        </td>

        <td width="20"></td>

      </tr>

      <tr>

        <td width="20"></td>

        <td colspan="2" style="padding-left: 10px; padding-top: 10px;"><small>

      This is an automated email regarding listing #: 1847238571</small><br><br>

    Hi Matthew,

    <br><br><div>

      A member has asked a question on your listing for "2.4KW 2400W 3KVA 24VDC Pure Sine Wave Power Inverter Solar Caravan Off Grid LCD".

    </div><br><table width="100%" cellpadding="3" cellspacing="0" border="0">

            <tr>

              <td align="center" width="20"><img width="20" height="20" alt="" src="https://trademe.tmcdn.co.nz/images/icon_question.gif">&nbsp;</td>

              <td>what is the warranty like? &nbsp;&nbsp;<small><i>posted by:&nbsp;</i></small>&nbsp;<b><a href="https://www.trademe.co.nz/Members/Listings.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">matihegarty</a></b>

    (<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">5</a>&nbsp;<a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937"><img align="absmiddle" border="0" src="https://www.trademe.co.nz/images/star.gif"></a>)

  &nbsp;&nbsp;&nbsp;<small>8:54 pm, Wed 28 Nov</small></td>

            </tr>

          </table><br><br><center><b><font size="3"><a href="https://www.trademe.co.nz/a.asp?id=1847238571&amp;qna=true#qna&amp;tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">Answer this question</a></font></b></center><br><br><div>

      We recommend you answer all questions on your listings to help buyers make informed decisions. Questions on vehicle listings created in Trade Me Motors will be displayed automatically. For other listings, questions will only be displayed if answered.

    </div><br><br>

    Happy trading!

    <br><br>

    The Trade Me team

    <br><a href="https://www.trademe.co.nz/?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">www.trademe.co.nz</a><br><br><small>

      If you don't wish to receive these emails or prefer plain text email, please update your

      <a href="https://www.trademe.co.nz/MyTradeMe/EmailOptions.aspx?tm=email&amp;et=201&amp;mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">email options</a></small></td>

        <td width="20"></td>

      </tr>

      <tr>

        <td colspan="3">

          <table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:White;">

            <tr>

              <td align="center"><br><small><img width="7" height="8" src="https://trademe.tmcdn.co.nz/images/3/common/triangle.gif">&nbsp;<font color="#666666">advertisement</font></small><br><br></td>

            </tr>

          </table>

          <table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:#9A9A9A;">

            <tr>

              <td><a href="https://www.trademe.co.nz/Link.aspx?i=101247"><img style="border-width:0;" src="https://trademe.tmcdn.co.nz/photoserver/adserver/TMI0003-700x70-mates-FA.png?e=" alt="" width="700" height="70"></a></td>

            </tr>

          </table>

        </td>

      </tr>

    </table>

  </body>

</html>

我的程序是这样做的:

    $cleanMessage = new DOMDocument();
    @$cleanMessage->loadHTML($this->bodyHTML); //To clean the html code for unclosed td table tags and other 

    $this->message = $cleanMessage->saveHTML();

但我的输出是:

��<�!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <�html><�head><�meta http-equiv="Content-Type" content="text/html; charset=utf-16"><�style type="text/css"> TD { font-family: Verdana,Tahoma,Arial, "Sans Serif"; font-size: 10pt; } BODY { font-family: Verdana,Tahoma,Arial, "Sans Serif"; font-size: 10pt; } <�/style><�/head><�body bgcolor="#eeeeee"><�img width="1" height="1" alt="" src="https://trademe.tmcdn.co.nz/images/1pixel.gif?gen=20181128"><�table cellspacing="0" cellpadding="0" width="700" bgcolor="white" align="center" style="border-left: 1px #CCCCCC solid; border-right: 1px #CCCCCC solid; border-top: 1px #CCCCCC solid;"><�tr><�td height="20" colspan="4">�<�/td> <�/tr><�tr><�td width="20"><�/td> <�td><�a href="https://www.trademe.co.nz/Track.aspx?site=2018112820201&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;"><�img border="0" alt="Trade Me Logo" width="246" height="48" src="https://trademe.tmcdn.co.nz/images/new-brand-2016/common/tm-logo-2016-246x48-v1.gif?gen=2018112820201"><�/a><�img src="https://api.trademe.co.nz/tracking/collect?evt=open&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937&tid=EB71C99D-BEB4-445F-B62B-C172AC5A4CF4"><�/td> <�td align="center"><�/td> <�td width="20"><�/td> <�/tr><�tr><�td width="20"><�/td> <�td colspan="2"> <�hr size="0" color="#CCCCCC"><�center><�small>Security Note: Trade Me will never ask you for your password via email<�/small><�/center> <�hr size="0" color="#CCCCCC"><�/td> <�td width="20"><�/td> <�/tr><�tr><�td width="20"><�/td> <�td colspan="2" style="padding-left: 10px; padding-top: 10px;"><�small> This is an automated email regarding listing #: 1847238571<�/small><�br><�br> Hi Matthew, <�br><�br><�div> A member has asked a question on your listing for "2.4KW 2400W 3KVA 24VDC Pure Sine Wave Power Inverter Solar Caravan Off Grid LCD". <�/div><�br><�table width="100%" cellpadding="3" cellspacing="0" border="0"><�tr><�td align="center" width="20"><�img width="20" height="20" alt="" src="https://trademe.tmcdn.co.nz/images/icon_question.gif">�<�/td> <�td>what is the warranty like? ��<�small><�i>posted by:�<�/i><�/small>�<�b><�a href="https://www.trademe.co.nz/Members/Listings.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">matihegarty<�/a><�/b> (<�a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">5<�/a>�<�a href="https://www.trademe.co.nz/Members/Feedback.aspx?member=4187691&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937"><�img align="absmiddle" border="0" src="https://www.trademe.co.nz/images/star.gif"><�/a>) ���<�small>8:54 pm, Wed 28 Nov<�/small><�/td> <�/tr><�/table><�br><�br><�center><�b><�font size="3"><�a href="https://www.trademe.co.nz/a.asp?id=1847238571&qna=true#qna&tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">Answer this question<�/a><�/font><�/b><�/center><�br><�br><�div> We recommend you answer all questions on your listings to help buyers make informed decisions. Questions on vehicle listings created in Trade Me Motors will be displayed automatically. For other listings, questions will only be displayed if answered. <�/div><�br><�br> Happy trading! <�br><�br> The Trade Me team <�br><�a href="https://www.trademe.co.nz/?tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">www.trademe.co.nz<�/a><�br><�br><�small> If you don't wish to receive these emails or prefer plain text email, please update your <�a href="https://www.trademe.co.nz/MyTradeMe/EmailOptions.aspx?tm=email&et=201&mt=75D6A1C7-4DEA-4B06-A3E9-6A12C1B41937" style="text-decoration: underline;">email options<�/a><�/small><�/td> <�td width="20"><�/td> <�/tr><�tr><�td colspan="3"> <�table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:White;"><�tr><�td align="center"><�br><�small><�img width="7" height="8" src="https://trademe.tmcdn.co.nz/images/3/common/triangle.gif">�<�font color="#666666">advertisement<�/font><�/small><�br><�br><�/td> <�/tr><�/table><�table cellspacing="0" cellpadding="0" border="0" width="100%" align="center" style="background-color:#9A9A9A;"><�tr><�td><�a href="https://www.trademe.co.nz/Link.aspx?i=101247"><�img style="border-width:0;" src="https://trademe.tmcdn.co.nz/photoserver/adserver/TMI0003-700x70-mates-FA.png?e=" alt="" width="700" height="70"><�/a><�/td> <�/tr><�/table><�/td> <�/tr><�/table><�/body><�/html>

我试过了:

1.

$this->bodyHTML = mb_convert_encoding($this->bodyHTML,'UTF-8','utf-16');
$this->bodyHTML = mb_convert_encoding($this->bodyHTML,'HTML-ENTITIES','UTF-8'); //both lines together
  1. $this->bodyHTML = mb_convert_encoding($this->bodyHTML,'HTML-ENTITIES','UTF-16');

但还是显示乱码或汉字

正确显示此 html 的正确方法是什么?

在您的 html 中,如果您看到奇怪的字符,请将字符集 utf-16 替换为 utf-8ISO-8859-1

$this->bodyHTML = str_replace("charset=utf-16","charset=utf-8", $this->bodyHTML);