我只想打印来自 GoogleNews 的 1 个故事
I only want to print 1 story from GoogleNews
我目前正在尝试弄清楚如何从 GoogleNews 打印 1 个故事,现在我还必须提到我正在 Web Scraping 它,这使得它变得更加困难(我猜)。我也尝试 google 它,但我在互联网上真的找不到任何东西。所以这是我的代码:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news_list = soup_page.findAll("item")
for news in news_list:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
所以当我 运行 这段代码时,它 returns 今天的一堆故事,但我只想打印出 1. 故事。有什么办法吗?
您可以使用以下查找方法来完成此操作:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news = soup_page.find("item")
#for news in news_list:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
或者您可以使用列表切片:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news_list = soup_page.findAll("item")
for news in news_list[:1]:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
输出:
Corona-News-Ticker: Die meisten Ungeimpften wollen ungeimpft bleiben - NDR.de
https://news.google.com/__i/rss/rd/articles/CBMigQFodHRwczovL3d3dy5uZHIuZGUvbmFjaHJpY2h0ZW4vaW5mby9Db3JvbmEtTmV3cy1UaWNrZXItRGllLW1laXN0ZW4tVW5nZWltcGZ0ZW4td29sbGVuLXVuZ2VpbXBmdC1ibGVpYmVuLGNvcm9uYWxpdmV0aWNrZXIxMzYyLmh0bWzSAQA?oc=5
Thu, 28 Oct 2021 10:56:34 GMT
我目前正在尝试弄清楚如何从 GoogleNews 打印 1 个故事,现在我还必须提到我正在 Web Scraping 它,这使得它变得更加困难(我猜)。我也尝试 google 它,但我在互联网上真的找不到任何东西。所以这是我的代码:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news_list = soup_page.findAll("item")
for news in news_list:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
所以当我 运行 这段代码时,它 returns 今天的一堆故事,但我只想打印出 1. 故事。有什么办法吗?
您可以使用以下查找方法来完成此操作:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news = soup_page.find("item")
#for news in news_list:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
或者您可以使用列表切片:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url = "https://news.google.com/rss?hl=de&gl=DE&ceid=DE:de"
Client = urlopen(news_url)
xml_page = Client.read()
Client.close()
soup_page = soup(xml_page, "xml")
news_list = soup_page.findAll("item")
for news in news_list[:1]:
print(news.title.text)
print(news.link.text)
print(news.pubDate.text)
输出:
Corona-News-Ticker: Die meisten Ungeimpften wollen ungeimpft bleiben - NDR.de
https://news.google.com/__i/rss/rd/articles/CBMigQFodHRwczovL3d3dy5uZHIuZGUvbmFjaHJpY2h0ZW4vaW5mby9Db3JvbmEtTmV3cy1UaWNrZXItRGllLW1laXN0ZW4tVW5nZWltcGZ0ZW4td29sbGVuLXVuZ2VpbXBmdC1ibGVpYmVuLGNvcm9uYWxpdmV0aWNrZXIxMzYyLmh0bWzSAQA?oc=5
Thu, 28 Oct 2021 10:56:34 GMT