我如何正确地从这个 xml 中提取信息

How do I properly extract information from this xml

我正在尝试查询 xml 文档以打印出与较低级别元素关联的较高级别元素属性。我得到的结果与 xml 结构不一致。基本上这是我到目前为止的代码。

import xml.etree.ElementTree as ET

tree = ET.parse('movies2.xml') root = tree.getroot()

for child in root:
    print(child.tag, child.attrib) print()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(child.attrib['category'], movie.attrib['title'])

这会产生这个-

genre {'category': 'Action'}
genre {'category': 'Thriller'}
genre {'category': 'Comedy'}

Comedy X-Men
Comedy American Psycho

如果检查 xml-

,最后两行实际上应该列出与电影标题相关的两种不同类型属性
Action X-Men
Thriller American Psycho

这是xml供参考-

<?xml version='1.0' encoding='utf8'?>
<collection>
    <genre category="Action">
        <decade years="1980s">
            <movie favorite="True" title="Indiana Jones: The raiders of 
              the lost Ark">
                <format multiple="No">DVD</format>
                <year>1981</year>
                <rating>PG</rating>
                <description>
                'Archaeologist and adventurer Indiana Jones 
                is hired by the U.S. government to find the Ark of the 
                Covenant before the Nazis.'
                </description>
            </movie>
               <movie favorite="True" title="THE KARATE KID">
               <format multiple="Yes">DVD,Online</format>
               <year>1984</year>
               <rating>PG</rating>
               <description>None provided.</description>
            </movie>
            <movie favorite="False" title="Back 2 the Future">
               <format multiple="False">Blu-ray</format>
               <year>1985</year>
               <rating>PG</rating>
               <description>Marty McFly</description>
            </movie>
        </decade>
        <decade years="1990s">
            <movie favorite="False" title="X-Men">
               <format multiple="Yes">dvd, digital</format>
               <year>2000</year>
               <rating>PG-13</rating>
               <description>Two mutants come to a private academy for     >                     their kind whose resident superhero team must 
                oppose a terrorist organization with similar powers. 
               </description>
            </movie>
            <movie favorite="True" title="Batman Returns">
               <format multiple="No">VHS</format>
               <year>1992</year>
               <rating>PG13</rating>
               <description>NA.</description>
            </movie>
               <movie favorite="False" title="Reservoir Dogs">
               <format multiple="No">Online</format>
               <year>1992</year>
               <rating>R</rating>
               <description>WhAtEvER I Want!!!?!</description>
            </movie>
        </decade>    
    </genre>

    <genre category="Thriller">
        <decade years="1970s">
            <movie favorite="False" title="ALIEN">
                <format multiple="Yes">DVD</format>
                <year>1979</year>
                <rating>R</rating>
                <description>"""""""""</description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="True" title="Ferris Bueller's Day Off">
                <format multiple="No">DVD</format>
                <year>1986</year>
                <rating>PG13</rating>
                <description>Funny movie about a funny guy</description>
            </movie>
            <movie favorite="FALSE" title="American Psycho">
                <format multiple="No">blue-ray</format>
                <year>2000</year>
                <rating>Unrated</rating>
                <description>psychopathic Bateman</description>
            </movie>
        </decade>
    </genre>

    <genre category="Comedy">
        <decade years="1960s">
            <movie favorite="False" title="Batman: The Movie">
                <format multiple="Yes">DVD,VHS</format>
                <year>1966</year>
                <rating>PG</rating>
                <description>What a joke!</description>
            </movie>
        </decade>
        <decade years="2010s">
            <movie favorite="True" title="Easy A">
                <format multiple="No">DVD</format>
                <year>2010</year>
                <rating>PG--13</rating>
                <description>Emma Stone = Hester Prynne</description>
            </movie>
            <movie favorite="True" title="Dinner for SCHMUCKS">
                <format multiple="Yes">DVD,digital,Netflix</format>
                <year>2011</year>
                <rating>Unrated</rating>
                <description>Tim (Rudd) is a rising executive who 
                 'succeeds' in finding the perfect guest, IRS employee 
                 Barry (Carell), for his boss' monthly event, a so-called 
                 'dinner for idiots,' which offers certain advantages to 
                 the exec who shows up with the biggest buffoon.
                </description>
            </movie>
        </decade>
        <decade years="1980s">
            <movie favorite="False" title="Ghostbusters">
                <format multiple="No">Online,VHS</format>
                <year>1984</year>
                <rating>PG</rating>
                <description>Who ya gonna call?</description>
            </movie>
        </decade>
        <decade years="1990s">
            <movie favorite="True" title="Robin Hood: Prince of Thieves">
                <format multiple="No">Blu_Ray</format>
                <year>1991</year>
                <rating>Unknown</rating>
                <description>Robin Hood slaying</description>
            </movie>
        </decade>
    </genre>
</collection>

你的初始循环:

for child in root:
    print(child.tag, child.attrib) print()

child 设置为最后一个 child;因此 child.attrib['category'] 将始终是最后一个 child 的类别。在你的例子中,最后一个 child 是一部喜剧。对于第二个循环中的每部电影:

for movie in mov:
   print(child.attrib['category'], movie.attrib['title'])

您正在打印在第一个循环中找到的最后一个 child 的类别;所以他们都打印 "Comedy".

编辑:这将至少 select 具有正确流派标签的相同电影,但顺序可能不同:

for child in root:
    mov = child.findall("./decade/movie/[year='2000']")
    for movie in mov:
        print(child.attrib['category'], movie.attrib['title'])

另一种方法,使用lxml代替elementree:

from lxml import etree as ET

tree = ET.parse('movies2.xml')
root = tree.getroot()

mov = root.findall("./genre/decade/movie/[year='2000']")
for movie in mov:
    print(movie.getparent().getparent().attrib['category'], movie.attrib['title'])