解析 HTML 以获取 Python 中的特定标签

Question

我正在尝试用 Python 解析 HTML 源。为此，我正在使用 BeautifulSoup。我需要得到的是获取所有 td 标签，其 ID 为 nameX 格式，其中 X 从 1 开始。因此它们与我们拥有的一样多 name1, name2, ...。

我怎样才能做到这一点？我使用正则表达式的简单代码不起作用。

soup = BeautifulSoup(response.text,"lxml")
resp=soup.find_all("td",{"id":'name*'})

错误：

IndexError: list index out of range

Answer 1

使用 lambda + startswith

soup.find_all('td', id=lambda x: x and x.startswith('name'))

或正则表达式

 soup.find_all('td', id=re.compile('^name'))

Parse HTML to get specific tags in Python