使用 FeedParser 导入 RSS 并将帖子和一般信息获取到单个 Pandas DataFrame

Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

作为 python 新手,我正在练习在 python 中导入数据。最后,我想分析来自不同播客的数据(播客本身的信息 每集),方法是将数据放入一个连贯的数据帧中,使用 NLP 对其进行处理。

到目前为止,我已经设法阅读了一个 RSS 提要列表,并获得了 RSS 提要的每一集的信息 (post)。

但我很难在 python 中找到一个 集成的 工作流程来收集两者

  1. 关于 RSS 提要的每一集的信息(post)
  2. 以及有关 RSS 提要的一般信息(如播客的标题) 一气呵成。

代码 这是我目前得到的

import feedparser
import pandas as pd

rss_feeds = ['http://feeds.feedburner.com/TEDTalks_audio',
        'https://joelhooks.com/rss.xml',
        'https://www.sciencemag.org/rss/podcast.xml',
    ]
#number of feeds is reduced for testing

posts = []
feed = []
for url in rss_feeds:
       feed = feedparser.parse(url)
       for post in feed.entries:
           posts.append((post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])

输出 数据框包括 652 non-null objects 三列(如预期的那样)——基本上每个播客中的每个 post。 title 列指的是剧集的标题,但 not 指的是播客的标题(在本例中是 'Ted Talk Daily') .

title link summary
0 3 questions to ask yourself about everything y... https://www.ted.com/talks/stacey_abrams_3_ques... How you respond to setbacks is what defines yo...
1 What your sleep patterns say about your relati... https://www.ted.com/talks/tedx_shorts_what_you... Wendy Troxel looks at the cultural expectation...
2 How we can actually pay people enough -- with ... https://www.ted.com/talks/ted_business_how_we_... Capitalism urgently needs an upgrade, says Pay...

我也在努力寻找一种方法将播客的标题也包含到这个数据框中。我总是在选择整个提要信息的部分时出错,例如['feed']['title'].

感谢您的每一个提示!

来源 到目前为止,我已经习惯了基于此来源的内容:

在这种情况下可以使用 feed.feed.title:

访问 Feed 标题
# ...
for url in rss_feeds:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((feed.feed.title, post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['feed_title', 'title', 'link', 'summary'])
df

输出:

          feed_title            title             link          summary
0    TED Talks Daily  3 ways compa...  https://www....  When we expe...
1    TED Talks Daily  How we could...  https://www....  Concrete is ...
2    TED Talks Daily  3 questions ...  https://www....  How you resp...
3    TED Talks Daily  What your sl...  https://www....  Wendy Troxel...
4    TED Talks Daily  How we can a...  https://www....  Capitalism u...
..               ...              ...              ...              ...
649  Science Maga...  Science Podc...  https://traf...  Fear-enhance...
650  Science Maga...  Science Podc...  https://traf...  Discussing t...
651  Science Maga...  Science Podc...  https://traf...  Talking kids...
652  Science Maga...  Science Podc...  https://traf...  The minimum ...
653  Science Maga...  Science Podc...  https://traf...  The origin o...