return RSS 属性值来自 BeautifulSoup

Question

RSS：（在名为 myfeed.rss 的文件中）

<?xml version="1.0" encoding="utf-8" ?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/">
    <channel>
    <title>MyFeed</title>
    <link>http://website</link>
    <description>RSS Feed</description>
    <language>en-us</language>

            <item>
                <title>title goes here</title> 
                <pubDate>Tue, 09 Jun 2015 15:15:23 -0600</pubDate>
                <category>x264</category>
                <link>https://link_goes_here</link>
                <description>various HTML goes here</description>
                <guid>https://another_link_goes_here</guid>
                <newznab:attr name="category" value="6000" />
                <newznab:attr name="category" value="6040" />

                <newznab:attr name="size" value="1923203792" />

                <newznab:attr name="grabs" value="3" />

                <newznab:attr name="comments" value="0" />
                <newznab:attr name="password" value="0" />
                <newznab:attr name="usenetdate" value="Tue 09 Jun 2015 15:15:23 -0600" />
            </item>

    </channel>
    </rss>

Python 脚本：

#!/usr/bin/env python

import sys
from bs4 import BeautifulSoup

handler = open('myfeed.rss').read()
soup = BeautifulSoup(handler, 'xml')

for item in soup.findAll('item'):
    print item
    print("-----------------------------------")
    print item.findAll('newznab:attr')

结果：项目共打印。我的分隔线打印出来了。但是打印了 none 的 newznab 属性。

问题：如何访问每个“newznab”属性？目前，我一直无法弄清楚如何将它们检索为字典。我是 Python 的新手。 :)

谢谢。

编辑：感谢 Rick 的建议，我现在能够按照以下方式访问这些属性：

更新了 Python 脚本：

#!/usr/bin/env python

import sys
from bs4 import BeautifulSoup

handler = open('myfeed.rss').read()
soup = BeautifulSoup(handler, 'lxml')

for item in soup.findAll('item'):
    print item
    print("-----------------------------------")

    newznabs = item.findAll('newznab:attr')
    newz_dict = {}

    for attribute in newznabs:
        newz_dict[attribute['name'].split(".")[0]] = attribute['value'].split(".")[0]

    print("newz_dict: [{}]".format(newz_dict))
    print("size: [{}]".format(newz_dict['size']))

    print("+++++++++++++++++++++++++++++++++++")

现在我在字典中有了属性。 :)

Answer 1

告诉 BeautifulSoup 使用 lxml 解析器似乎会自动关闭标签。

尝试使用：

soup = BeautifulSoup(handler, 'lxml')

return RSS 属性值来自 BeautifulSoup

return RSS attribute values via BeautifulSoup

python

rss

attributes

beautifulsoup