从复杂的 XML 中得到正确的结果

getting correct result from complicated XML

我想得到正确的输出 userID , itemID 和相关的 balance 在每个开单并导出结果

我正在使用以下代码重复 itemID/userID:

每个 'user' 可以有很多 'items' 并且每个项目都有余额。每个项目的userID可以重复

    userid = node.findtext('./userID')
    itemids = node.findall('./bill/item/itemID')
    bills = node.findall(".//bill/balance")
      
   for item in itemids:
      for bill in bills:
         print(userid, item.text, bill.text) 

这里是 XML

的例子
<user>
    <userID>10269</userID>
    <name>
        <displayName>SAFIYA NASSER ABDULLAH AL SIYABI</displayName>
        <firstName>SAFIYA</firstName>
        <middleName>NASSER ABDULLAH</middleName>
        <lastName>AL SIYABI</lastName>
    </name>
    <library>MAIN</library>
    <numberOfBills>3</numberOfBills>
    <bill>
        <item>
            <callNumber>BP173.4 .B57 2003</callNumber>
            <copyNumber>1</copyNumber>
            <itemID>423999</itemID>
            <library>MAIN</library>
            <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">1.20</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">1.20</balance>
        <library>MAIN</library>
    </bill>
    <bill>
        <item>
            <callNumber>BP173.3 .G423 2004</callNumber>
            <copyNumber>2</copyNumber>
            <itemID>429053</itemID>
            <library>MAIN</library>
            <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">1.20</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">1.20</balance>
        <library>MAIN</library>
    </bill>
    <bill>
        <item>
            <callNumber>BP173.3 .N34 2003</callNumber>
            <copyNumber>1</copyNumber>
            <itemID>423991</itemID>
            <library>MAIN</library>
           <dateCreated>2009-02-15</dateCreated>
            <isPermanent>true</isPermanent>
        </item>
        <amount currency="OR">24.00</amount>
        <reason>OVERDUE</reason>
        <balance currency="OR">24.00</balance>
        <library>MAIN</library>
    </bill>
</user>

提前致谢

您将遍历每个 <item>,然后对于每个项目,您将从头开始遍历每个 <bill>。您基本上使用 node.findall('.//itemID') 的长度作为遍历所有账单标签的次数,这不是您想要的。

遍历每个账单,然后在嵌套的 for 循环中,遍历在该特定账单下找到的项目,而不是文档中的每个项目。

for bill in node.findall('bill'):
    balance = bill.find('balance')
    for item in bill.findall('item'):
        itemID = item.find('itemID')

考虑 list/dict 理解以提取选定的 XML 数据:

import xml.etree.ElementTree as et

doc = et.parse("Input.xml")

user_bill_list_of_dict = [{'userID': doc.findtext('userID'),
                           'itemID': b.find('item').findtext('itemID'),
                           'balance': b.findtext('balance')
                          } for b in doc.findall('bill')]
         
print(user_bill_list_of_dict)
# [{'userID': '10269', 'itemID': '423999', 'balance': '1.20'}, 
#  {'userID': '10269', 'itemID': '429053', 'balance': '1.20'}, 
#  {'userID': '10269', 'itemID': '423991', 'balance': '24.00'}]

您甚至可以使用 dictionary merging(可用 Python 3.5+)扩展所有 XML 数据:

data = [{**{'userID': doc.findtext('userID')},
         **{n.tag:n.text for n in doc.findall('./name/*')},
         **{i.tag:i.text for i in bill.findall('item/*')},
         **{b.tag:b.text for b in bill.findall('*') if b.tag != 'item'},
        } for bill in doc.findall('bill')]

print(data)
# [{'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#   'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI', 
#   'callNumber': 'BP173.4 .B57 2003', 'copyNumber': '1', 'itemID': '423999', 
#   'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
#   'reason': 'OVERDUE', 'balance': '1.20'}, 
#  {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#   'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI', 
#   'callNumber': 'BP173.3 .G423 2004', 'copyNumber': '2', 'itemID': '429053', 
#   'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
#    'reason': 'OVERDUE', 'balance': '1.20'}, 
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI', 
#  'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
#  'callNumber': 'BP173.3 .N34 2003', 'copyNumber': '1', 'itemID': '423991', 
#  'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '24.00',
#  'reason': 'OVERDUE', 'balance': '24.00'}]

更进一步,以上数据可以迁移到Pandas数据框:

import pandas as pd
...

df = pd.DataFrame(data)

#   userID                       displayName firstName       middleName   lastName          callNumber copyNumber  itemID library dateCreated isPermanent amount   reason balance
# 0  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI   BP173.4 .B57 2003          1  423999    MAIN  2009-02-15        true   1.20  OVERDUE    1.20
# 1  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI  BP173.3 .G423 2004          2  429053    MAIN  2009-02-15        true   1.20  OVERDUE    1.20
# 2  10269  SAFIYA NASSER ABDULLAH AL SIYABI    SAFIYA  NASSER ABDULLAH  AL SIYABI   BP173.3 .N34 2003          1  423991    MAIN  2009-02-15        true  24.00  OVERDUE   24.00