用 Beautiful Soup 提取 Table 中的所有链接

Question

<td style="text-align: center;"><a title="Some title" href="https://www.blabla.com">Testing</a></td>

我正在尝试使用 BeautifulSoup 获取 a 标签的所有 href，它们是 td 标签的子标签。

我可以运行

urls = [x for x in soup.findAll("td")]

获取所有 td 标签，然后手动遍历它们以查看它们是否包含 a 标签，如果包含则提取 href，但是有没有更简洁的方法在一行中执行此操作？

Answer 1

尝试使用 :has() CSS 选择器来 select 所有具有 <a> 标签的 td 标签。

from bs4 import BeautifulSoup

html = """<td style="text-align: center;"><a title="Some title" href="https://www.blabla.com">Testing</a></td>"""
soup = BeautifulSoup(html, "html.parser")
print([tag.find("a")["href"] for tag in soup.select("td:has(a)")])

输出：

['https://www.blabla.com']

用 Beautiful Soup 提取 Table 中的所有链接

Extract all Links in Table with Beautiful Soup

html

python

beautifulsoup

html-parsing