pandas read_html 找不到table 怎么优雅的处理？

Question

pandas read_html 是一种解析 tables 的好方法；但是，如果找不到具有指定属性的 table，它将失败并导致整个代码失败。

我正在尝试抓取数千个网页，如果仅仅因为找不到一个 table 而导致错误并终止整个程序，那将是非常烦人的。有没有办法捕获这些错误，让代码继续不终止？

link = 'https://en.wikipedia.org/wiki/Barbados'  
req = requests.get(pink)
wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
df = wiki_table[0]

这会导致整个代码失败。我该如何处理？我觉得应该是异常处理或者错误捕获相关的东西，但是我不熟悉python，不知道怎么做。

Answer 1

为此使用try catch

link = 'https://en.wikipedia.org/wiki/Barbados'  
req = requests.get(pink)
try:
    # No error
    wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
except:
    # Error
    print("Error") 
df = wiki_table[0]

Answer 2

将 pd.read_html 嵌入到 try ... except ... 异常处理程序中

import requests
import pandas as pd

link = 'https://en.wikipedia.org/wiki/Barbados'
req = requests.get(link)

wiki_table = None 
try:
    wiki_table = pd.read_html(req, attrs = {"class":"infobox vcard"})
except TypeError as e: # to limit the catched exception to a minimal
    print(str(e)) # optional but useful

if wiki_table:
    df = wiki_table[0]
    
    # rest of your code

pandas read_html 找不到table 怎么优雅的处理？

how to deal with pandas read_html gracefully when it fails to find a table?

python

wikipedia

exception

web-scraping

pandas