如何return XML 中的值（标题，public 日期，link）然后将其存储到列表中？

Question

我正在使用 PYTHON class 和 ElementTree XML

这是我目前的 class:

代码

class Story():
def __init__(self, title, link, pub_date):
    # TODO: your code here
    self.title = title
    self.link = link
    self.pub_date = pub_date
def __str__(self):
    # TODO: your code here
    return self.title + '. (' + self.pub_date + ')' + '\n' + self.link

解析代码XML:

import urllib.request
import xml.etree.ElementTree as ET

url = 'https://www.yahoo.com/news/rss'

with urllib.request.urlopen(url) as response:
   data = response.read()
   root = ET.fromstring(data)
   channel = root[0]
   for news_title in channel.iter('title'):
       print(news_title.text + '\n')
   for news_pub_date in channel.iter('pubDate'):
       print(news_pub_date.text + '\n')
   for news_link in channel.iter('link'):
       print(news_link.text + '\n')

我可以打印标题、public 日期和 link，但它们是分开的。那么，我如何结合 class 来存储列表中的内容来打印这样的结果：

白宫有132个房间和自己的餐厅。这是乔·拜登的新家里面的样子。(2021-02-25T20:21:03Z) https://news.yahoo.com/us-bombs-facilities-syria-used-003717572.html

为什么德州的暴风雪会吸引 anti-Biden 阴谋论？ (2021-02-26T00:37:17Z) https://news.yahoo.com/sturgeon-blasts-salmond-faces-claim-193113374.html

这个问题的要求是我需要创建一个 def 来获取内容并 return 它在列表中：

def get_contents(source='yahoo') -> List[content]:
     contents = []

     # Do some code here

     return contents

感谢您的帮助。

Answer 1

这是针对您提供的 XML 的更有效的列表理解：

contents = [[ch[0].text,ch[1].text, ch[2].text] for ch in channel[8:]]

这与下面的代码完成的相同，但不需要遍历每个标题、pubDate 或 link。

这似乎是一个使用列表理解（旧）的可行解决方案：

    titles = [nt.text for nt  in channel.iter('title')]
    dates = [pd.text for pd  in channel.iter('pubDate')]
    links = [nl.text for nl  in channel.iter('link')]
    contents = [[titles[i], dates[i], links[i]] for i in range(len(titles)-1)]

Answer 2

您可以使用 findall() to get all item elements and then use find() 从每个 item 中获取 title、link 和 pubDate child，例如：

import urllib.request
import xml.etree.ElementTree as ET

url = 'https://www.yahoo.com/news/rss'
with urllib.request.urlopen(url) as response:
    data = response.read()
    root = ET.fromstring(data)
    for item in root.findall('channel/item'):
        print(item.find('title').text)
        print(item.find('link').text)
        print(item.find('pubDate').text)
        print()

如何return XML 中的值（标题，public 日期，link）然后将其存储到列表中？

How to return the value (title, public date, link) from XML then store it into the List?

python

xml

list

elementtree