Link 属性未在 BeautifulSoup object 中打印
Link attribute not getting printed in BeautifulSoup object
我正在编写一个程序,它将从 google 新闻中提取头条新闻。它应该打印文章的标题和 link。但是,它不会打印 link.
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()
soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
print(news.title.text)
print(news.link.text)
print("-"*10)
这是输出行的示例
Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS
----------
它应该打印标题和 link。但它只打印标题
您应该修改代码中的这一行:
soup_page=soup(xml_page,"lxml")
进入:
soup_page=soup(xml_page,"xml")
你得到了结果。
这个 html 有一个奇怪的结构,但是如果你把代码中的 for
循环改成这样:
for news in news_list:
link = news.select_one('title')
print(link.text)
print(link.next_sibling.next_sibling)
print("-"*10)
您应该获得带有 link 的标题。
我正在编写一个程序,它将从 google 新闻中提取头条新闻。它应该打印文章的标题和 link。但是,它不会打印 link.
import bs4
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen
news_url="https://news.google.com/news/rss"
Client=urlopen(news_url)
xml_page=Client.read()
Client.close()
soup_page=soup(xml_page,"lxml")
news_list=soup_page.findAll("item")
# Print news title, url and publish date
for news in news_list:
print(news.title.text)
print(news.link.text)
print("-"*10)
这是输出行的示例
Following Falcon 9 Saturday launch, CRS-17 Dragon arrives at the ISS
----------
它应该打印标题和 link。但它只打印标题
您应该修改代码中的这一行:
soup_page=soup(xml_page,"lxml")
进入:
soup_page=soup(xml_page,"xml")
你得到了结果。
这个 html 有一个奇怪的结构,但是如果你把代码中的 for
循环改成这样:
for news in news_list:
link = news.select_one('title')
print(link.text)
print(link.next_sibling.next_sibling)
print("-"*10)
您应该获得带有 link 的标题。