使用 feedparser 分别识别 itunes:keywords 和 itunes:category？

Question

我正在使用 feedparser to parse rss feeds such as https://www.relay.fm/analogue/feed，但无法确定如何明确识别 itunes:category 值。

查看 feedparser itunes tests 似乎 itunes:keywords 和 itunes:category 值都被放入 feed['tags'] 字典中。

来自 category 的测试：

<!--
Description: iTunes channel category
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology'
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:category text="Technology"></itunes:category>
    </channel>
</rss>

然后 keywords:

<!--
Description: iTunes channel keywords
Expect:      not bozo and feed['tags'][0]['term'] == 'Technology' and 
'itunes_keywords' not in feed
-->
<rss xmlns:itunes="http://www.itunes.com/DTDs/Podcast-1.0.dtd">
    <channel>
        <itunes:keywords>Technology</itunes:keywords>
    </channel>
</rss>

对于上面的示例提要，条目是：

<itunes:keywords>Hurley, Liss, feelings</itunes:keywords>

和

<itunes:category text="Society &amp; Culture"/>
<itunes:category text="Technology"/>

导致 feed[tags] 被填充为：

[{'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Hurley'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Liss'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'feelings'},
 {'label': None,'scheme': 'http://www.itunes.com/','term': 'Society & Culture'},
 {'label': None, 'scheme': 'http://www.itunes.com/', 'term': 'Technology'}]

有什么方法可以唯一标识来自 itunes:category 标签的值吗？

Answer 1

我也找不到只用 feedparser so I made use of beautifulsoup 的方法：

import bs4

soup = bs4.BeautifulSoup(raw_data, "lxml")        

def is_itunes_category(tag):
        return tag.name == 'itunes:category'

categories = [tag.attrs['text'] for tag in soup.find_all(is_itunes_category)]

Answer 2

Feedparser v.6.0.2 实现特定的 itunes:x 属性

itunes:category 在 feedparser

category

import feedparser
feedp = feedparser.parse(url)
category = feedp.feed.category

itunes:keywords 在 feedparser 中确实重命名为 tags 并填充到 term

但渠道关键词与商品关键词混用要单独识别项目关键字，请使用 scheme 作为过滤器

import feedparser
feedp = feedparser.parse(url)
#get all the keywords both item and channel
keywords = [k["term"] for k in feedp["feed"]["tags"]] 
# get the keywords from all the items 
keyword = [t["term"] for t in feedp["feed"]["tags"] if  t["scheme"] == 'http://www.itunes.com/']

如果可用，这可能会删除其他标签，但如果 itunes:keywords 和标签共存，则它们是重复的。

itunes:duration 可用作 itunes_duration

import feedparser
feedp = feedparser.parse(url)
duration = feedp["itunes_duration"]

有点题外话，但要完成答案：

如果有多个类别可用，它们将在类别中作为元组公开如documentation

中所述

>>>import feedparser
>>>feedp = feedparser.parse(url)
>>>categories = feedp.feed.categories 
>>>print(categories)
>>>[(u'Syndic8', u'1024'),
(u'dmoz', 'Top/Society/People/Personal_Homepages/P/')]

但是itunes因为没有多分类...

不再需要用beautifulSoup4重新解析。

使用 feedparser 分别识别 itunes:keywords 和 itunes:category？

Identify itunes:keywords and itunes:category individually with feedparser?

python

rss

feedparser