从 table 中提取特定的 href

Question

我正在尝试提取“10-K”url 并将其附加到以下站点的列表中：

https://www.sec.gov/Archives/edgar/data/320193/000091205701544436/0000912057-01-544436-index.htm

图片1

所以基本上我试图提取第一个没有作为其子类别的第一个。

我正在尝试创建一个循环以在多个类似链接中循环此代码，但我想我现在正在尝试先解决此问题。

有什么想法吗？

Answer 1

希望这能满足您的要求。

import requests
from bs4 import BeautifulSoup

URL = "https://www.sec.gov/Archives/edgar/data/320193/000091205701544436/0000912057-01-544436-index.htm"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

rows = soup.findAll("td")

href_list = []
for ele in rows:
    a_Tag = ele.findChildren("a")
    if a_Tag:
        href_list.append(a_Tag)

print(href_list)

Answer 2

我不确定我是否理解你的问题，但如果我理解正确，这可以帮助你

from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.sec.gov/Archives/edgar/data/320193/000091205701544436/0000912057-01-544436-index.htm")

s = BeautifulSoup(page.content, "html.parser")
print(s.find("table").findChild("a")["href"])

从 table 中提取特定的 href

Extracting a specific href from table

python

href