从复杂的 XML 中得到正确的结果
getting correct result from complicated XML
我想得到正确的输出 userID , itemID 和相关的 balance 在每个开单并导出结果
我正在使用以下代码重复 itemID/userID:
每个 'user' 可以有很多 'items' 并且每个项目都有余额。每个项目的userID可以重复
userid = node.findtext('./userID')
itemids = node.findall('./bill/item/itemID')
bills = node.findall(".//bill/balance")
for item in itemids:
for bill in bills:
print(userid, item.text, bill.text)
这里是 XML
的例子
<user>
<userID>10269</userID>
<name>
<displayName>SAFIYA NASSER ABDULLAH AL SIYABI</displayName>
<firstName>SAFIYA</firstName>
<middleName>NASSER ABDULLAH</middleName>
<lastName>AL SIYABI</lastName>
</name>
<library>MAIN</library>
<numberOfBills>3</numberOfBills>
<bill>
<item>
<callNumber>BP173.4 .B57 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423999</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .G423 2004</callNumber>
<copyNumber>2</copyNumber>
<itemID>429053</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .N34 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423991</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">24.00</amount>
<reason>OVERDUE</reason>
<balance currency="OR">24.00</balance>
<library>MAIN</library>
</bill>
</user>
提前致谢
您将遍历每个 <item>
,然后对于每个项目,您将从头开始遍历每个 <bill>
。您基本上使用 node.findall('.//itemID')
的长度作为遍历所有账单标签的次数,这不是您想要的。
遍历每个账单,然后在嵌套的 for 循环中,遍历在该特定账单下找到的项目,而不是文档中的每个项目。
for bill in node.findall('bill'):
balance = bill.find('balance')
for item in bill.findall('item'):
itemID = item.find('itemID')
考虑 list/dict 理解以提取选定的 XML 数据:
import xml.etree.ElementTree as et
doc = et.parse("Input.xml")
user_bill_list_of_dict = [{'userID': doc.findtext('userID'),
'itemID': b.find('item').findtext('itemID'),
'balance': b.findtext('balance')
} for b in doc.findall('bill')]
print(user_bill_list_of_dict)
# [{'userID': '10269', 'itemID': '423999', 'balance': '1.20'},
# {'userID': '10269', 'itemID': '429053', 'balance': '1.20'},
# {'userID': '10269', 'itemID': '423991', 'balance': '24.00'}]
您甚至可以使用 dictionary merging(可用 Python 3.5+)扩展所有 XML 数据:
data = [{**{'userID': doc.findtext('userID')},
**{n.tag:n.text for n in doc.findall('./name/*')},
**{i.tag:i.text for i in bill.findall('item/*')},
**{b.tag:b.text for b in bill.findall('*') if b.tag != 'item'},
} for bill in doc.findall('bill')]
print(data)
# [{'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.4 .B57 2003', 'copyNumber': '1', 'itemID': '423999',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
# 'reason': 'OVERDUE', 'balance': '1.20'},
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.3 .G423 2004', 'copyNumber': '2', 'itemID': '429053',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
# 'reason': 'OVERDUE', 'balance': '1.20'},
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.3 .N34 2003', 'copyNumber': '1', 'itemID': '423991',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '24.00',
# 'reason': 'OVERDUE', 'balance': '24.00'}]
更进一步,以上数据可以迁移到Pandas数据框:
import pandas as pd
...
df = pd.DataFrame(data)
# userID displayName firstName middleName lastName callNumber copyNumber itemID library dateCreated isPermanent amount reason balance
# 0 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.4 .B57 2003 1 423999 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 1 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .G423 2004 2 429053 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 2 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .N34 2003 1 423991 MAIN 2009-02-15 true 24.00 OVERDUE 24.00
我想得到正确的输出 userID , itemID 和相关的 balance 在每个开单并导出结果
我正在使用以下代码重复 itemID/userID:
每个 'user' 可以有很多 'items' 并且每个项目都有余额。每个项目的userID可以重复
userid = node.findtext('./userID')
itemids = node.findall('./bill/item/itemID')
bills = node.findall(".//bill/balance")
for item in itemids:
for bill in bills:
print(userid, item.text, bill.text)
这里是 XML
的例子<user>
<userID>10269</userID>
<name>
<displayName>SAFIYA NASSER ABDULLAH AL SIYABI</displayName>
<firstName>SAFIYA</firstName>
<middleName>NASSER ABDULLAH</middleName>
<lastName>AL SIYABI</lastName>
</name>
<library>MAIN</library>
<numberOfBills>3</numberOfBills>
<bill>
<item>
<callNumber>BP173.4 .B57 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423999</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .G423 2004</callNumber>
<copyNumber>2</copyNumber>
<itemID>429053</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .N34 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423991</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">24.00</amount>
<reason>OVERDUE</reason>
<balance currency="OR">24.00</balance>
<library>MAIN</library>
</bill>
</user>
提前致谢
您将遍历每个 <item>
,然后对于每个项目,您将从头开始遍历每个 <bill>
。您基本上使用 node.findall('.//itemID')
的长度作为遍历所有账单标签的次数,这不是您想要的。
遍历每个账单,然后在嵌套的 for 循环中,遍历在该特定账单下找到的项目,而不是文档中的每个项目。
for bill in node.findall('bill'):
balance = bill.find('balance')
for item in bill.findall('item'):
itemID = item.find('itemID')
考虑 list/dict 理解以提取选定的 XML 数据:
import xml.etree.ElementTree as et
doc = et.parse("Input.xml")
user_bill_list_of_dict = [{'userID': doc.findtext('userID'),
'itemID': b.find('item').findtext('itemID'),
'balance': b.findtext('balance')
} for b in doc.findall('bill')]
print(user_bill_list_of_dict)
# [{'userID': '10269', 'itemID': '423999', 'balance': '1.20'},
# {'userID': '10269', 'itemID': '429053', 'balance': '1.20'},
# {'userID': '10269', 'itemID': '423991', 'balance': '24.00'}]
您甚至可以使用 dictionary merging(可用 Python 3.5+)扩展所有 XML 数据:
data = [{**{'userID': doc.findtext('userID')},
**{n.tag:n.text for n in doc.findall('./name/*')},
**{i.tag:i.text for i in bill.findall('item/*')},
**{b.tag:b.text for b in bill.findall('*') if b.tag != 'item'},
} for bill in doc.findall('bill')]
print(data)
# [{'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.4 .B57 2003', 'copyNumber': '1', 'itemID': '423999',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
# 'reason': 'OVERDUE', 'balance': '1.20'},
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.3 .G423 2004', 'copyNumber': '2', 'itemID': '429053',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '1.20',
# 'reason': 'OVERDUE', 'balance': '1.20'},
# {'userID': '10269', 'displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',
# 'firstName': 'SAFIYA', 'middleName': 'NASSER ABDULLAH', 'lastName': 'AL SIYABI',
# 'callNumber': 'BP173.3 .N34 2003', 'copyNumber': '1', 'itemID': '423991',
# 'library': 'MAIN', 'dateCreated': '2009-02-15', 'isPermanent': 'true', 'amount': '24.00',
# 'reason': 'OVERDUE', 'balance': '24.00'}]
更进一步,以上数据可以迁移到Pandas数据框:
import pandas as pd
...
df = pd.DataFrame(data)
# userID displayName firstName middleName lastName callNumber copyNumber itemID library dateCreated isPermanent amount reason balance
# 0 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.4 .B57 2003 1 423999 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 1 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .G423 2004 2 429053 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 2 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .N34 2003 1 423991 MAIN 2009-02-15 true 24.00 OVERDUE 24.00