美丽的汤不返回任何东西
Beautiful Soup not returning anything
您好,我正在尝试使用 Beautiful Soup 从网站上抓取网页并打印事实。这是网站 https://fungenerators.com/random/facts/animal/weasel。我正在尝试通过网络抓取事实,尽管它最终总是打印 [] 知道我的代码有什么问题吗??
from urllib.request import urlopen
from bs4 import BeautifulSoup
scrape = "https://fungenerators.com/random/facts/animal/weasel"
request_page = urlopen(scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
fact = html_soup.find_all('div', class_="wow fadeInUp animated animated")
print(fact)
改用我的代码!!!
import requests
from bs4 import BeautifulSoup
response = requests.get('https://fungenerators.com/random/facts/animal/weasel')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.select('div.wow.fadeInUp.animated.animated')
print(result[0].text)
结果将是:
Random Weasel Fact
或者,如果您不想使用 css 选择器,那么您可以这样做:
import requests
from bs4 import BeautifulSoup
response = requests.get('https://fungenerators.com/random/facts/animal/weasel')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.find_all('h2', class_="wow fadeInUp animated")
print(result[0].text)
您的代码有两个问题:
您想要的元素在 h2
标签下,而不是 div
.
由于某些数据是动态加载的,class-名称发生了变化,并删除了第二次出现的“动画”一词。 class-name 不是 wow fadeInUp animated animated
,而是 wow fadeInUp animated
.
参见以下示例:
from urllib.request import urlopen
from bs4 import BeautifulSoup
scrape = "https://fungenerators.com/random/facts/animal/weasel"
request_page = urlopen(scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
fact = html_soup.find_all('h2', class_="wow fadeInUp animated")
print(fact)
(因为只有一个标签,你可能要考虑使用find()
而不是find_all()
,以便使用.text
方法获取文本):
...
fact = html_soup.find('h2', class_="wow fadeInUp animated").text
您好,我正在尝试使用 Beautiful Soup 从网站上抓取网页并打印事实。这是网站 https://fungenerators.com/random/facts/animal/weasel。我正在尝试通过网络抓取事实,尽管它最终总是打印 [] 知道我的代码有什么问题吗??
from urllib.request import urlopen
from bs4 import BeautifulSoup
scrape = "https://fungenerators.com/random/facts/animal/weasel"
request_page = urlopen(scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
fact = html_soup.find_all('div', class_="wow fadeInUp animated animated")
print(fact)
改用我的代码!!!
import requests
from bs4 import BeautifulSoup
response = requests.get('https://fungenerators.com/random/facts/animal/weasel')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.select('div.wow.fadeInUp.animated.animated')
print(result[0].text)
结果将是:
Random Weasel Fact
或者,如果您不想使用 css 选择器,那么您可以这样做:
import requests
from bs4 import BeautifulSoup
response = requests.get('https://fungenerators.com/random/facts/animal/weasel')
soup = BeautifulSoup(response.content, 'html.parser')
result = soup.find_all('h2', class_="wow fadeInUp animated")
print(result[0].text)
您的代码有两个问题:
您想要的元素在
h2
标签下,而不是div
.由于某些数据是动态加载的,class-名称发生了变化,并删除了第二次出现的“动画”一词。 class-name 不是
wow fadeInUp animated animated
,而是wow fadeInUp animated
.
参见以下示例:
from urllib.request import urlopen
from bs4 import BeautifulSoup
scrape = "https://fungenerators.com/random/facts/animal/weasel"
request_page = urlopen(scrape)
page_html = request_page.read()
request_page.close()
html_soup = BeautifulSoup(page_html, 'html.parser')
fact = html_soup.find_all('h2', class_="wow fadeInUp animated")
print(fact)
(因为只有一个标签,你可能要考虑使用find()
而不是find_all()
,以便使用.text
方法获取文本):
...
fact = html_soup.find('h2', class_="wow fadeInUp animated").text