Beautiful Soup KeyError 'href' 但绝对存在
Beautiful Soup KeyError 'href' but the definitely exist
正在尝试从站点中提取所有链接。我收到“KeyError: 'href'”,但据我了解,这仅适用于存在不存在 href 的标签时。但是,当我查看 soup 对象时,每个标签都有一个 href。所以我不明白为什么我会看到这个错误。我查了很多这个,每个人总是只提到没有 href 的标签。
from bs4 import BeautifulSoup
from datetime import datetime
import pandas as pd
import requests
page_count = 1
catalog_page = f"https://40kaudio.com/page/{str(page_count)}/?s"
while page_count < 4:
print(f"Begin Book Scrape from {catalog_page}")
# Soup opens the page.
open_page = requests.get(catalog_page)
# We create a soup object that has all the page stuff in it
soup = BeautifulSoup(open_page.content, "html.parser")
# We iterate through that soup object and pull out anything with a class of "title-post"
for link in soup.find_all('h2', "title-post"):
print(link['href'])
else:
print('By the Emperor!')
link
中没有 href
标签。但是,link
中有一个 a
标记,a
标记中有一个 href
属性。
<a href="https://40kaudio.com/justin-d-hill-cadia-stands-audiobook/" rel="bookmark">Justin D. Hill – Cadia Stands Audiobook</a>
for link in soup.find_all('h2', "title-post"):
print(link.a['href'])
不要忘记在 while
循环中递增 page_count
。
正在尝试从站点中提取所有链接。我收到“KeyError: 'href'”,但据我了解,这仅适用于存在不存在 href 的标签时。但是,当我查看 soup 对象时,每个标签都有一个 href。所以我不明白为什么我会看到这个错误。我查了很多这个,每个人总是只提到没有 href 的标签。
from bs4 import BeautifulSoup
from datetime import datetime
import pandas as pd
import requests
page_count = 1
catalog_page = f"https://40kaudio.com/page/{str(page_count)}/?s"
while page_count < 4:
print(f"Begin Book Scrape from {catalog_page}")
# Soup opens the page.
open_page = requests.get(catalog_page)
# We create a soup object that has all the page stuff in it
soup = BeautifulSoup(open_page.content, "html.parser")
# We iterate through that soup object and pull out anything with a class of "title-post"
for link in soup.find_all('h2', "title-post"):
print(link['href'])
else:
print('By the Emperor!')
link
中没有 href
标签。但是,link
中有一个 a
标记,a
标记中有一个 href
属性。
<a href="https://40kaudio.com/justin-d-hill-cadia-stands-audiobook/" rel="bookmark">Justin D. Hill – Cadia Stands Audiobook</a>
for link in soup.find_all('h2', "title-post"):
print(link.a['href'])
不要忘记在 while
循环中递增 page_count
。