捕获 beautifulsoup HTMLParseError 异常

Question

我从 Beutifulsoup HTMLParseError: expected name token at u'<![0Y', at line 1371, column 24 收到异常 - 因为我正在阅读的 html 格式不正确。

如何捕获此错误 - 我已经尝试过

 try: 
     ... 
 except HTMLParseError:
     pass

但这会导致错误 NameError: global name 'HTMLParseError' is not defined

我也试过 except BeautifulSoup.HTMLParseError: 但那会出现错误 AttributeError: type object 'BeautifulSoup' has no attribute 'HTMLParseError'

更广泛地说，当我从我正在使用的包中收到自定义错误时，如何才能 "work out" 需要什么异常来处理它？

Answer 1

你试过捕获 NameError 异常吗？

如果你不能抓住它试试这个：

try:
    # error happens
except Exception as e:
    # log the exception here
    print(e)

Answer 2

BeautifulSoup 正在从 HTMLParser 库中引发 HTMLParseError。在 try/except:

中使用它之前尝试从该库中导入错误

from HTMLParser import HTMLParseError

try:
    # error happens
except HTMLParseError:
    pass

有关 HTMLParse 库的更多信息是 here。

查看 BeautifulSoup 源代码中出现错误的地方 here。

Capturing beautifulsoup HTMLParseError exception