Python 有趣的是,libtidy 抛出的异常无法捕获

Python exception thrown by libtidy is amusingly impossible to catch

我正在尝试使用 tidylib 中的 tidy_document() 函数将 html 文档格式化为 xhtml 然后我才能 post 它在某个地方和一个堆栈向上几步,抛出异常。代码被包裹在一个 try...except 块中,包含大约 3 个更通用的 except 语句,以将我的网络撒得更广,但无论如何异常都会通过它们传播 none正在执行的 except 个正文中的代码。

违规代码:

from tidylib import tidy_document

...

try:
    xhtmlDoc, errors = tidy_document(htmlContent)
except UnicodeDecodeError as ude:
    print("Caught the exception")
except UnicodeError as ue:
    print("Caught the exception")
except Exception as ex:
    print("Caught the exception")
except:
    print("Caught the exception")

无论 htmlContent 是以 str 形式发送还是以 utf-8 byte 形式编码都无关紧要。

生成的堆栈跟踪如下:

  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 232, in 'calling callback function'
  File "/home/legend855/anaconda3/lib/python3.7/site-packages/tidylib/sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

sink.py 中的有问题的行包装在 try...except 中可以解决问题,但根据我的理解,这不应该是图书馆的工作。客户端(我的代码)应该能够根据需要处理异常,目前,我不明白为什么我不能。 None 我的 except 主体中的打印语句曾经被执行过。

p.s。我对调用函数执行了 return 假值,以从进一步处理中删除记录,但我已将代码减少到重现错误所需的最低限度。

下面的 html 片段是以 strbyte 格式作为变量 htmlContent 传递并触发异常的内容。

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://ogp.me/ns#" lang="ja" xml:lang="ja">

<head>
  <meta http-equiv="X-UA-Compatible" content="IE=8 ; IE=9" />
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  <meta http-equiv="Content-Language" content="ja" />
  <meta name="viewport" content="width=1024, maximum-scale=1.0, user-scalable=0">
  <meta property="og:title" content="TECHNOLOGY MAKES HAPPINESS(テクノロジー メイクス ハピネス)- 先端地図技術が創るスマートライフ -|ゼンリン" />
  <meta property="og:type" content="article" />
  <meta property="og:description" content="ゼンリンが地図を制作する過程で培われた技術をアニメーションや解説を用いて紹介する特設サイトです。" />
  <meta property="og:url" content="http://www.zenrin.co.jp/create/technology/index.html" />
  <meta property="og:image" content="http://www.zenrin.co.jp/create/technology/images/ogp_image.jpg" />
  <meta property="og:site_name" content="TECHNOLOGY MAKES HAPPINESS(テクノロジー メイクス ハピネス)- 先端地図技術が創るスマートライフ -|ゼンリン" />
  <meta property="og:locale" content="ja_JP" />
  <meta property="fb:app_id" content="248887565152095" />

  <meta property="title" content="TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ - ゼンリン" />
  <meta property="description" content="ビッグデータの世界を拓くゼンリンの先端技術で実現する“しあわせ”をご紹介します。" />
  <meta property="keywords" content="地図,住宅地図,カーナビソフト,GIS,ゼンリン,zenrin,map,地図ソフト,デジタルマップ" />

  <title>TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ - ゼンリン</title>
  <link rel="stylesheet" type="text/css" href="common/css/common.css">
  <script type="text/javascript" src="common/js/jquery-1.9.1.min.js"></script>
  <script type="text/javascript" src="common/js/lib.js"></script>
  <script type="text/javascript" src="common/js/zenrin.js"></script>
</head>

<body style="overflow:hidden;">
  <noscript>
 <div class="noscript">
 <p>現在JavaScriptがOFFに設定されています。ゼンリンのすべての機能を使用するためには、JavaScriptの設定をONに変更してください。</p>
 </div>
</noscript>

  <div id="preloaderWrp">
    <p id="preloader">
      <img src="common/img/splash.gif" width="558" height="45">
      <img src="common/img/animation/preloader.gif" height="32" width="32" class="spinner">
    </p>
  </div>
  <script type="text/javascript">
    PreLoader.init();
  </script>
  <div id="spec_lightbox" class="lb_fit">
    <div class="inner lb_fit">
      <div class="modal_window">
        <p>
          <img src="common/img/spec_img.gif" alt="ご利用環境について" />
          <a class="closebtn" href="#">閉じる</a>
        </p>
      </div>
    </div>
  </div>
  <div id="light_box">
    <div class="inner">
      <div id="lb_bg"></div>
      <div id="modal_window">
        <div class="inner">
          <div id="spec_area">
            <img src="common/img/space.gif" id="info_spec" />
          </div>
          <div id="aniamtion_area">
            <img src="common/img/space.gif" id="info_anima" />
            <div class="preloader">
              <img src="common/img/animation/preloader.gif" height="32" width="32">
            </div>
          </div>
          <div id="last_area">
            <div id="net1_title">
              <img src="common/img/navi/happiness1.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 歩行者ネットワークが実現するしあわせ" />
            </div>
            <div id="net2_title">
              <img src="common/img/navi/happiness2.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 自動車ネットワークが実現するしあわせ" />
            </div>
            <div id="net3_title">
              <img src="common/img/navi/happiness3.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 付随情報が実現するしあわせ" />
            </div>
            <div id="lib1_title">
              <img src="common/img/navi/happiness4.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 高精度到着地点情報が実現するしあわせ" />
            </div>
            <div id="lib2_title">
              <img src="common/img/navi/happiness5.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 注記情報が実現するしあわせ" />
            </div>
            <div id="lib3_title">
              <img src="common/img/navi/happiness6.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 施設内・地下情報が実現するしあわせ" />
            </div>
            <div id="lib4_title">
              <img src="common/img/navi/happiness7.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 3次元コンテンツが実現するしあわせ" />
            </div>
            <div id="map1_title">
              <img src="common/img/navi/happiness8.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 地図データ提供技術が実現するしあわせ" />
            </div>
            <div id="mak1_title">
              <img src="common/img/navi/happiness15.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS マーケティング支援が実現するしあわせ" />
            </div>
            <div id="route_title">
              <img src="common/img/navi/happiness10.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 最適ルート案内を実現する技術" />
            </div>
            <div id="adas_title">
              <img src="common/img/navi/happiness11.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 自動車の安全運転支援を実現する技術" />
            </div>
            <div id="multi_title">
              <img src="common/img/navi/happiness12.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS ドアtoドアの誘導を実現する技術" />
            </div>
            <div id="hazard_title">
              <img src="common/img/navi/happiness13.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 事故・災害時の活用を実現する技術" />
            </div>
            <div id="area_title">
              <img src="common/img/navi/happiness14.png" height="70" width="690" alt="TECHNOLOGY MAKES HAPPINESS 営業活動支援を実現する技術" />
            </div>
            <ul>
              <li id="net1Btn">
                <a href="#network1_lightBox" class="trk_last_network1">
                  <img src="common/img/navi/btn1.jpg" height="300" width="340" alt="歩行者ネットワーク" />
                </a>
              </li>
              <li id="net2Btn">
                <a href="#network2_lightBox" class="trk_last_network2">
                  <img src="common/img/navi/btn2.jpg" height="300" width="340" alt="自動車ネットワーク" />
                </a>
              </li>
              <li id="net3Btn">
                <a href="#network3_lightBox" class="trk_last_network3">
                  <img src="common/img/navi/btn3.jpg" height="300" width="340" alt="付随情報" />
                </a>
              </li>
              <li id="lib1Btn">
                <a href="#lib1_lightBox" class="trk_last_lib1">
                  <img src="common/img/navi/btn4.jpg" height="300" width="340" alt="高精度到着地点情報" />
                </a>
              </li>
              <li id="lib2Btn">
                <a href="#lib3_lightBox" class="trk_last_lib3">
                  <img src="common/img/navi/btn5.jpg" height="300" width="340" alt="施設内・地下情報" />
                </a>
              </li>
              <li id="lib3Btn">
                <a href="#lib2_lightBox" class="trk_last_lib2">
                  <img src="common/img/navi/btn6.jpg" height="300" width="340" alt="注記情報" />
                </a>
              </li>
              <li id="map1Btn">
                <a href="#map1_lightBox" class="trk_last_map1">
                  <img src="common/img/navi/btn7.jpg" height="300" width="340" alt="地図データ提供技術" />
                </a>
              </li>
              <li id="mak1Btn">
                <a href="#mak1_lightBox" class="trk_last_mak1">
                  <img src="common/img/navi/btn8.jpg" height="300" width="340" alt="マーケティング支援" />
                </a>
              </li>
              <li id="routeBtn">
                <a href="#route_lightBox" class="trk_last_route">
                  <img src="common/img/navi/btn21.jpg" height="300" width="340" alt="最適ルート案内" />
                </a>
              </li>
              <li id="adasBtn">
                <a href="#adas_lightBox" class="trk_last_adas">
                  <img src="common/img/navi/btn22.jpg" height="300" width="340" alt="自動車の安全運転支援" />
                </a>
              </li>
              <li id="multiBtn">
                <a href="#multi_lightBox" class="trk_last_multi">
                  <img src="common/img/navi/btn23.jpg" height="300" width="340" alt="ドアtoドアの誘導" />
                </a>
              </li>
              <li id="hazardBtn">
                <a href="#hazard_lightBox" class="trk_last_hazard">
                  <img src="common/img/navi/btn24.jpg" height="300" width="340" alt="災害時の活用" />
                </a>
              </li>
              <li id="areaBtn">
                <a href="#area_lightBox" class="trk_last_area">
                  <img src="common/img/navi/btn25.jpg" height="300" width="340" alt="営業活動支援" />
                </a>
              </li>
              <li id="modal_close_Btn">
                <a href="#modal_close">
                  <img src="common/img/modal_close_btn.png" height="132" width="122">
                </a>
              </li>
            </ul>
          </div>
          <div id="trigger_area">
            <div class="trigger_inner">
              <div id="info_txt_wrp">
                <table cellpadding="0" cellspacing="0" width="730" height="150">
                  <tr>
                    <td id="info_txt"></td>
                  </tr>
                </table>
              </div>
              <div id="more_trigger">
                <a href="#" class="trk_more">
                  <div></div>
                </a>
              </div>
            </div>
          </div>
        </div>
      </div>
    </div>
  </div>


  <div id="wrapper">
    <div id="map">
      <img src="common/img/bg.jpg" alt="" id="defaultmap" />
      <img src="common/img/map/map1.jpg" alt="" id="map1" />
      <!-- <img src="common/img/map/map1.jpg" alt="" id="map1" /> -->
      <img src="common/img/map/target.png" height="131" width="226" id="target" />
      <img src="common/img/map/target.png" height="131" width="226" id="target2" />
    </div>
    <div id="slide_bg" class="clear">
      <div id="slide_content_area">
        <div id="slide1">
          <ul id="slide1_inner">
            <li class="li1">
              <a href="#skil1" class="trk_skil1">
                <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援">
              </a>
            </li>
            <li class="li2">
              <a href="#skil2" class="trk_skil2">
                <img src="common/img/navi/navi2_off.jpg" height="230" width="230" alt="ネットワーク情報">
              </a>
            </li>
            <li class="li3">
              <a href="#skil3" class="trk_skil3">
                <img src="common/img/navi/navi3_off.jpg" height="230" width="230" alt="高精度情報ライブラリ">
              </a>
            </li>
            <li class="li4">
              <a href="#skil4" class="trk_skil4">
                <img src="common/img/navi/navi4_off.jpg" height="230" width="230" alt="地図データ提供技術">
              </a>
            </li>
            <li class="li5">
              <a href="#skil1" class="trk_skil1">
                <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援">
              </a>
            </li>
            <li class="li6">
              <a href="#skil2" class="trk_skil2">
                <img src="common/img/navi/navi2_off.jpg" height="230" width="230" alt="ネットワーク情報">
              </a>
            </li>
            <li class="li7">
              <a href="#skil3" class="trk_skil3">
                <img src="common/img/navi/navi3_off.jpg" height="230" width="230" alt="高精度情報ライブラリ">
              </a>
            </li>
            <li class="li8">
              <a href="#skil4" class="trk_skil4">
                <img src="common/img/navi/navi4_off.jpg" height="230" width="230" alt="地図データ提供技術">
              </a>
            </li>
            <li class="li9">
              <a href="#skil1" class="trk_skil1">
                <img src="common/img/navi/navi1_off.jpg" height="230" width="230" alt="マーケティング支援">
              </a>
            </li>
          </ul>
        </div>


        <div id="slide2">
          <ul id="slide2_inner">
            <li class="li1">
              <a href="#route_lightBox" class="trk_route">
                <img src="common/img/navi/navi5_off.jpg" height="230" width="230" alt="Route Support 雨にぬれなくて階段がすくない行き方はないかな・・・">
              </a>
            </li>
            <li class="li2">
              <a href="#adas_lightBox" class="trk_adas">
                <img src="common/img/navi/navi6_off.jpg" height="230" width="230" alt="ADAS もしも、の時も心に余裕のある運転がしたいな">
              </a>
            </li>
            <li class="li3">
              <a href="#multi_lightBox" class="trk_multi">
                <img src="common/img/navi/navi7_off.jpg" height="230" width="230" alt="Multi Modal 車を降りてから目的地までの歩行経路が分からなくて困るな・・・">
              </a>
            </li>
            <li class="li4">
              <a href="#hazard_lightBox" class="trk_hazard">
                <img src="common/img/navi/navi8_off.jpg" height="230" width="230" alt="Hazard Database 事故や災害の時に警察や消防がすぐに駆けつけてくれるのはなぜだろう?">
              </a>
            </li>
            <li class="li5">
              <a href="#area_lightBox" class="trk_area">
                <img src="common/img/navi/navi9_off.jpg" height="230" width="230" alt="Business Support この商品が売れそうな60代女性が住む地域はどこかしら?">
              </a>
            </li>
            <li class="li6">
              <a href="#route_lightBox" class="trk_route">
                <img src="common/img/navi/navi5_off.jpg" height="230" width="230" alt="Route Support 雨にぬれなくて階段がすくない行き方はないかな・・・">
              </a>
            </li>
            <li class="li7">
              <a href="#adas_lightBox" class="trk_adas">
                <img src="common/img/navi/navi6_off.jpg" height="230" width="230" alt="ADAS もしも、の時も心に余裕のある運転がしたいな">
              </a>
            </li>
            <li class="li8">
              <a href="#multi_lightBox" class="trk_multi">
                <img src="common/img/navi/navi7_off.jpg" height="230" width="230" alt="Multi Modal 車を降りてから目的地までの歩行経路が分からなくて困るな・・・">
              </a>
            </li>
            <li class="li9">
              <a href="#hazard_lightBox" class="trk_hazard">
                <img src="common/img/navi/navi8_off.jpg" height="230" width="230" alt="Hazard Database 事故や災害の時に警察や消防がすぐに駆けつけてくれるのはなぜだろう?">
              </a>
            </li>
            <li class="li10">
              <a href="#area_lightBox" class="trk_area">
                <img src="common/img/navi/navi9_off.jpg" height="230" width="230" alt="Business Support この商品が売れそうな60代女性が住む地域はどこかしら?">
              </a>
            </li>

          </ul>
        </div>
        <div id="title">
          <img src="common/img/title.png" height="104" width="554" alt="TECHNOLOGY MAKES HAPPINESS 先端地図技術が創るスマートライフ POWERD BY ZENRIN" />
        </div>
        <div id="slash1">
          <img src="common/img/slash01.png" height="230" width="585" alt="TECHNOLOGY ビッグデータの世界を拓くゼンリンの先端技術 ADVANCED TECHNOLOGIES AND DATA SOLUTIONS." />
        </div>
        <div id="slash3">
          <img src="common/img/slash03.png" height="230" width="230" alt="" />
        </div>
        <div id="slash5">
          <img src="common/img/slash05.png" height="126" width="356" alt="" />
        </div>

        <div id="slash2">
          <img src="common/img/slash02.png" height="230" width="232" alt="" />
        </div>
        <div id="slash6">
          <img src="common/img/slash06.png" height="126" width="358" alt="" />
        </div>
        <div id="slash4">
          <img src="common/img/slash04.png" height="230" width="587" alt="HAPPINESS ゼンリンの技術で実現するしあわせ MAP TECHNOLOGY REALIZES SMART LIFE." />
        </div>
      </div>


    </div>


    <div id="content_page">
      <div id="header">
        <div class="inner">
          <div class="backto">
            <a href="#" id="backto">
              <img src="common/img/back_btn_off.png" height="49" width="204">
            </a>
          </div>
          <div id="typ1">
            <img src="common/img/typ1_header.png" height="239" width="240">
          </div>
          <div id="typ2">
            <img src="common/img/typ2_header.png" height="239" width="240">
          </div>
        </div>
      </div>



      <div id="network_navi_area">
        <div class="menuClick">
          <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" />
        </div>
        <div class="title">
          <img src="common/img/skil_title.png" height="15" width="210" alt="ゼンリンの技術1 ネットワーク情報" />
        </div>
        <ul>
          <li>
            <a href="#network1_lightBox" class="trk_network1"><img src="common/img/left_navi01_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#network2_lightBox" class="trk_network2"><img src="common/img/left_navi02_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#network3_lightBox" class="trk_network3"><img src="common/img/left_navi03_off.png" height="70" width="211"></a>
          </li>
        </ul>
        <div class="cover"></div>
      </div>

      <div id="lib_navi_area">
        <div class="menuClick">
          <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" />
        </div>
        <div class="title">
          <img src="common/img/skil2_title.png" height="14" width="211" alt="ゼンリンの技術2 高精度情報ライブラリ" />
        </div>
        <ul>
          <li>
            <a href="#lib1_lightBox" class="trk_lib1"><img src="common/img/left_navi04_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#lib2_lightBox" class="trk_lib2"><img src="common/img/left_navi05_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#lib3_lightBox" class="trk_lib3"><img src="common/img/left_navi06_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#lib4_lightBox" class="trk_lib4"><img src="common/img/left_navi07_off.png" height="70" width="211"></a>
          </li>
        </ul>
        <div class="cover"></div>
      </div>

      <div id="map_navi_area">
        <div class="menuClick">
          <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" />
        </div>
        <div class="title">
          <img src="common/img/skil3_title.png" height="14" width="211" alt="ゼンリンの技術3 地図データ提供技術" />
        </div>
        <ul>
          <li>
            <a href="#map1_lightBox" class="trk_map1"><img src="common/img/left_navi08_off.png" height="70" width="211"></a>
          </li>
        </ul>
        <div class="cover"></div>
      </div>

      <div id="mak_navi_area">
        <div class="menuClick">
          <img class="ipad_conv" src="common/img/left_menu_hover.gif" src_i="common/img/i_left_menu_hover.gif" height="78" width="70" alt="メニューをクリック" />
        </div>
        <div class="title">
          <img src="common/img/skil4_title.png" height="14" width="211" alt="ゼンリンの技術4 マーケティング支援" />
        </div>
        <ul>
          <li>
            <a href="#mak1_lightBox" class="trk_mak1"><img src="common/img/left_navi09_off.png" height="70" width="211"></a>
          </li>
        </ul>
        <div class="cover"></div>
      </div>

      <div id="right_navi_area">
        <div class="menuClick_right">
          <img class="ipad_conv" src="common/img/right_menu_hover.gif" src_i="common/img/i_right_menu_hover.gif" height="78" width="70" alt="メニューをクリック" />
        </div>
        <div class="title" style="text-align:right;">
          <img src="common/img/happy_title.png" height="14" width="212" alt="この技術が実現するしあわせ" />
        </div>
        <ul>
          <li>
            <a href="#route_lightBox" class="trk_rnavi_route"><img src="common/img/right_navi_01_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#adas_lightBox" class="trk_rnavi_adas"><img src="common/img/right_navi_02_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#multi_lightBox" class="trk_rnavi_multi"><img src="common/img/right_navi_03_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#hazard_lightBox" class="trk_rnavi_hazard"><img src="common/img/right_navi_04_off.png" height="70" width="211"></a>
          </li>
          <li>
            <a href="#area_lightBox" class="trk_rnavi_area"><img src="common/img/right_navi_05_off.png" height="70" width="211"></a>
          </li>
        </ul>
        <div class="cover"></div>
      </div>
    </div>

    <div id="footer_area">
      <div class="inner">
        <div class="copyright">
          <a id="footerlogo" class="trk_footerlogo" href="http://www.zenrin.co.jp/" target="_blank"><img src="common/img/copyright.png" height="29" width="318窶?" alt="ZENRIN Maps to the Future COPYRIGHT c ZENRIN CO., LTD. ALL RIGHT RESERVED."></a>
        </div>
        <div class="spec">
          <a class="trk_spec" href="#spec_lightbox"><img src="common/img/spec_btn.gif" height="11" width="96" alt="ご利用環境について"></a>
        </div>
        <div id="social_area">
          <ul class="clearfix">
            <li>
              <a class="trk_twitter" href="http://twitter.com/share?count=horizontal&original_referer=http://www.zenrin.co.jp/create/technology/&text=TECHNOLOGY%20MAKES%20HAPPINESS%20%E5%85%88%E7%AB%AF%E5%9C%B0%E5%9B%B3%E6%8A%80%E8%A1%93%E3%81%8C%E5%89%B5%E3%82%8B%E3%82%B9%E3%83%9E%E3%83%BC%E3%83%88%E3%83%A9%E3%82%A4%E3%83%95%E3%80%90%E3%82%BC%E3%83%B3%E3%83%AA%E3%83%B3%E3%80%91%0A&url=http://www.zenrin.co.jp/create/technology/"
                onclick="window.open(this.href, 'tweetwindow', 'width=550, height=450,personalbar=0,toolbar=0,scrollbars=1,resizable=1'); return false;"><img src="common/img/twitter.png" width="30" height="20" /></a>
            </li>
            <li>
              <a class="trk_facebook" href="http://www.facebook.com/share.php?u=http://www.zenrin.co.jp/create/technology/" onclick="window.open(this.href, 'FBwindow', 'width=650, height=450, menubar=no, toolbar=no, scrollbars=yes'); return false;"><img src="common/img/facebook.png" width="25" height="20" /></a>
            </li>
          </ul>
        </div>
      </div>
    </div>
  </div>
  <div id="footer2">
    <a href="http://www.zenrin.co.jp/" target="_blank"><img src="common/img/copyright2.gif" height="60" width="363" alt="ZENRIN Maps to the Future COPYRIGHT c ZENRIN ALL RIGHT RESERVED."></a>
  </div>

  <div style="display:none;">
    <!-- for display network -->
    <script type="text/javascript" language="javascript" src="//b92.yahoo.co.jp/js/s_retargeting.js"></script>
    <script type="text/javascript">
      /* <![CDATA[ */
      var yahoo_ss_retargeting_id = 1000387951;
      var yahoo_sstag_custom_params = window.yahoo_sstag_params;
      var yahoo_ss_retargeting = true;
      /* ]]> */
    </script>
    <!-- for sponsored search -->
    <script type="text/javascript" src="//s.yimg.jp/images/listing/tool/cv/conversion.js">
    </script>
    <noscript>
<div style="display:inline;">
<img height="1" width="1" style="border-style:none;" alt="" src="//b97.yahoo.co.jp/pagead/conversion/1000387951/?guid=ON&script=0&disvt=false"/>
</div>
</noscript>
  </div>

</body>

</html>

我设法在 Win 上重现了该问题(将 HTML 片段保存在文件中)。下面是最后一个代码变体。

code00.py:

#!/usr/bin/env python

import sys
import os
import threading

os.environ["PATH"] += os.pathsep + os.path.abspath(os.path.dirname(__file__))  # Built tidy.dll in the cwd, this is needed for it to be found
from tidylib import tidy_document


def main(*argv):
    print("main - TID: {0:d}".format(threading.get_ident()))
    mode = "rb"
    raw_content = open("content.html", mode=mode).read()
    enc = "utf-8" if len(sys.argv) < 2 else sys.argv[1]
    html_content = raw_content.decode(enc)
    print(html_content.encode(enc) == raw_content)
    with open("content_utf8.html", "w", encoding=enc) as fout:
        fout.write(html_content)
    try:
        xhtml_doc, errors = tidy_document(html_content)
    except UnicodeDecodeError as ude:
        print("Caught the exception:", ude)
    except UnicodeError as ue:
        print("Caught the exception:", ue)
    except Exception as ex:
        print("Caught the exception:", ex)
    except:
        print("Caught an exception")


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\nDone.")
    sys.exit(rc)

输出:

[cfati@CFATI-5510-0:e:\Work\Dev\Whosebug\q059054833]> "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\Scripts\python.exe" code00.py
Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:59:51) [MSC v.1928 64 bit (AMD64)] 64bit on win32

main - TID: 9528
True
Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940>
Traceback (most recent call last):
File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 0: unexpected end of data
Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940>
Traceback (most recent call last):
File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa in position 0: invalid start byte
Exception ignored on calling ctypes callback function: <function Sink.__init__.<locals>.put_byte at 0x000002144F596940>
Traceback (most recent call last):
File "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\lib\site-packages\tidylib\sink.py", line 79, in put_byte
    write_func(byte.decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

Done.

我测试了(临时修改sink.py),确实是同一个线程。然后,我仔细查看了堆栈跟踪,并弄清楚了:

  1. PyTidyLib 从后端 Tidy 库调用一些 C 代码(tidy.dll), 通过 CTypes
  2. (以上)C代码调用了一些Python代码(Sink.put_byte ), 作为与参数一起传递给它的回调
  3. 上一步的 (Python) 代码引发异常,但基础 C代码(调用它的)不“知道”如何将它传递回 #1.,因为它没有 Python 任何“知识”(所以异常“死”在那里)

这就是为什么你无法在 Python.

中捕捉到它的原因

我尝试读取其他不同编码的文件,但没有成功。然后我又做了一些调试,好像有 3 个无效的 UTF-8 字符 (\x07, \xAA , \xB6 - 当与其他文件组合时)在你的文件中。
当然,尝试从单个字节解码 UTF-8 字符对我来说似乎很奇怪,但这可能是 PyTidyLib 错误。



更新#0

因为我不得不构建 tidy.dll(因为我不想开始 Lnx VMs 或在 Cygwin 下安装 .whl 来做所有测试,我也上传了它(和其他工件)到 [GitHub]: CristiFati/Prebuilt-Binaries - Prebuilt-Binaries/HTML-Tidy/v5.7.28.