使用 FeedParser 导入 RSS 并将帖子和一般信息获取到单个 Pandas DataFrame

Question

作为 python 新手，我正在练习在 python 中导入数据。最后，我想分析来自不同播客的数据（播客本身的信息和每集），方法是将数据放入一个连贯的数据帧中，使用 NLP 对其进行处理。

到目前为止，我已经设法阅读了一个 RSS 提要列表，并获得了 RSS 提要的每一集的信息 (post)。

但我很难在 python 中找到一个 集成的 工作流程来收集两者

关于 RSS 提要的每一集的信息（post）
以及有关 RSS 提要的一般信息（如播客的标题）一气呵成。

代码这是我目前得到的

import feedparser
import pandas as pd

rss_feeds = ['http://feeds.feedburner.com/TEDTalks_audio',
        'https://joelhooks.com/rss.xml',
        'https://www.sciencemag.org/rss/podcast.xml',
    ]
#number of feeds is reduced for testing

posts = []
feed = []
for url in rss_feeds:
       feed = feedparser.parse(url)
       for post in feed.entries:
           posts.append((post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['title', 'link', 'summary'])

输出数据框包括 652 non-null objects 三列（如预期的那样）——基本上每个播客中的每个 post。 title 列指的是剧集的标题，但 not 指的是播客的标题（在本例中是 'Ted Talk Daily'） .

	title	link	summary
0	3 questions to ask yourself about everything y...	https://www.ted.com/talks/stacey_abrams_3_ques...	How you respond to setbacks is what defines yo...
1	What your sleep patterns say about your relati...	https://www.ted.com/talks/tedx_shorts_what_you...	Wendy Troxel looks at the cultural expectation...
2	How we can actually pay people enough -- with ...	https://www.ted.com/talks/ted_business_how_we_...	Capitalism urgently needs an upgrade, says Pay...

我也在努力寻找一种方法将播客的标题也包含到这个数据框中。我总是在选择整个提要信息的部分时出错，例如['feed']['title'].

感谢您的每一个提示！

来源到目前为止，我已经习惯了基于此来源的内容：

Answer 1

在这种情况下可以使用 feed.feed.title:

访问 Feed 标题

# ...
for url in rss_feeds:
    feed = feedparser.parse(url)
    for post in feed.entries:
        posts.append((feed.feed.title, post.title, post.link, post.summary))

df = pd.DataFrame(posts, columns=['feed_title', 'title', 'link', 'summary'])
df

输出：

          feed_title            title             link          summary
0    TED Talks Daily  3 ways compa...  https://www....  When we expe...
1    TED Talks Daily  How we could...  https://www....  Concrete is ...
2    TED Talks Daily  3 questions ...  https://www....  How you resp...
3    TED Talks Daily  What your sl...  https://www....  Wendy Troxel...
4    TED Talks Daily  How we can a...  https://www....  Capitalism u...
..               ...              ...              ...              ...
649  Science Maga...  Science Podc...  https://traf...  Fear-enhance...
650  Science Maga...  Science Podc...  https://traf...  Discussing t...
651  Science Maga...  Science Podc...  https://traf...  Talking kids...
652  Science Maga...  Science Podc...  https://traf...  The minimum ...
653  Science Maga...  Science Podc...  https://traf...  The origin o...

使用 FeedParser 导入 RSS 并将帖子和一般信息获取到单个 Pandas DataFrame

Import RSS with FeedParser and Get Both Posts and General Information to Single Pandas DataFrame

python

rss

feedparser

pandas