使用 elementtree 对嵌套元素进行排序
Sorting nested elements with elementtree
我正在尝试使用带有 elementree 的 XML 属性对 XML 文件中大量嵌套的元素进行排序。我需要阅读的 XML 个元素的结构是这样的:
<ChargeItem>
<!-- some other data-->
<SupplementaryOffer OfferID=SomeIdNumber>
<!-- more data-->
</SupplementaryOffer>
</ChargeItem>
我的代码下一步应该做什么:
- 解析XML文件,获取
SupplementaryOffer
标签
- 读取
OfferID
属性。
- 根据查找列表添加新属性
OrderID
。
- 根据
ChargeItem
中的 OrderID
对每个 SuplementaryOffer
进行排序。
我现在需要做的是:对每个 SupplementaryOffer
(4) 进行排序
此代码:
for c in tree.iter("ChargeItem"):
c[:] = sorted(c, key=lambda child: (child.tag,child.get('OrderId')))
这是我用来尝试排序的方法,但据我所知,这根本不起作用。
以下是现有 XML 的缩减,其中已添加 'OrderId' 属性:
<BATCH >
<BILL>
<Somebillinfo>0</Somebillinfo>
<INVtype>2</INVtype>
<PageOne>
<Stuff></Stuff>
</PageOne>
<Page2>
<ServiceAddressCharges>
<ServiceAddress>
<ServiceAddress1>221B Baker Street</ServiceAddress1>
</ServiceAddress>
<ProductsSection>
<BrilliantProducts id="20033" DisplayMethod="0">Snack services
<ChargeItemList>
<ServiceNo>0123456478</ServiceNo>
<PrimaryOffer OfferId="80000000">Blueberry Icecream</PrimaryOffer>
<ParentBundle SortKey="NO_BUNDLE" ParentBundleId="0" ConnectReason="0" DisconnectReason="0">
<Bundle SortKey="NO_BUNDLE" BundleId="0" ConnectReason="0" DisconnectReason="0">
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
<ChargeItem SortKey="NODSP">
<SupplementaryOffer OfferId="80003606" ConnectReason="1" DisconnectReason="0" OrderId="10">Smaller amount of potato
<DateStart>2016-11-04</DateStart>
<UsageAmount Discount="627.68">630.13</UsageAmount>
<UsageItem>
<ChargeDescr>IncludedSnacks</ChargeDescr>
</UsageItem>
<UsageItem>
<ChargeDescr>SharedSnacks</ChargeDescr>
</UsageItem>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A ginormous amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
</Bundle>
</ParentBundle>
</ChargeItemList>
</BrilliantProducts>
</ProductsSection>
</ServiceAddressCharges>
</Page2>
</BILL>
</BATCH >
根据 'OrderID' 属性的排序,我希望第一个 'ChargeItem' 的结果是:
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
另一种方法。
from simplified_scrapy import SimplifiedDoc, utils, req
html = '''
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
'''
doc = SimplifiedDoc(html)
ChargeItem = doc.select('ChargeItem@SortKey="ICE"')
SupplementaryOffers = ChargeItem.selects('SupplementaryOffer')
xmls = {} # Cache node and OrderId correspondence
for e in SupplementaryOffers:
xmls[int(e.OrderId)] = e.outerHtml
Orderids = [int(id) for id in SupplementaryOffers.select('OrderId()')]
Orderids.sort() # Sort OrderId and replace nodes in this order
i = 0
for e in SupplementaryOffers: # Replace nodes in order
e.repleaceSelf(xmls[Orderids[i]])
i += 1
print(doc.html) # Output sorted XML
结果:
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
这里有更多示例,包括解析和更新:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples
我正在尝试使用带有 elementree 的 XML 属性对 XML 文件中大量嵌套的元素进行排序。我需要阅读的 XML 个元素的结构是这样的:
<ChargeItem>
<!-- some other data-->
<SupplementaryOffer OfferID=SomeIdNumber>
<!-- more data-->
</SupplementaryOffer>
</ChargeItem>
我的代码下一步应该做什么:
- 解析XML文件,获取
SupplementaryOffer
标签 - 读取
OfferID
属性。 - 根据查找列表添加新属性
OrderID
。 - 根据
ChargeItem
中的OrderID
对每个SuplementaryOffer
进行排序。
我现在需要做的是:对每个 SupplementaryOffer
(4) 进行排序
此代码:
for c in tree.iter("ChargeItem"):
c[:] = sorted(c, key=lambda child: (child.tag,child.get('OrderId')))
这是我用来尝试排序的方法,但据我所知,这根本不起作用。
以下是现有 XML 的缩减,其中已添加 'OrderId' 属性:
<BATCH >
<BILL>
<Somebillinfo>0</Somebillinfo>
<INVtype>2</INVtype>
<PageOne>
<Stuff></Stuff>
</PageOne>
<Page2>
<ServiceAddressCharges>
<ServiceAddress>
<ServiceAddress1>221B Baker Street</ServiceAddress1>
</ServiceAddress>
<ProductsSection>
<BrilliantProducts id="20033" DisplayMethod="0">Snack services
<ChargeItemList>
<ServiceNo>0123456478</ServiceNo>
<PrimaryOffer OfferId="80000000">Blueberry Icecream</PrimaryOffer>
<ParentBundle SortKey="NO_BUNDLE" ParentBundleId="0" ConnectReason="0" DisconnectReason="0">
<Bundle SortKey="NO_BUNDLE" BundleId="0" ConnectReason="0" DisconnectReason="0">
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
<ChargeItem SortKey="NODSP">
<SupplementaryOffer OfferId="80003606" ConnectReason="1" DisconnectReason="0" OrderId="10">Smaller amount of potato
<DateStart>2016-11-04</DateStart>
<UsageAmount Discount="627.68">630.13</UsageAmount>
<UsageItem>
<ChargeDescr>IncludedSnacks</ChargeDescr>
</UsageItem>
<UsageItem>
<ChargeDescr>SharedSnacks</ChargeDescr>
</UsageItem>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A ginormous amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
</Bundle>
</ParentBundle>
</ChargeItemList>
</BrilliantProducts>
</ProductsSection>
</ServiceAddressCharges>
</Page2>
</BILL>
</BATCH >
根据 'OrderID' 属性的排序,我希望第一个 'ChargeItem' 的结果是:
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
另一种方法。
from simplified_scrapy import SimplifiedDoc, utils, req
html = '''
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
'''
doc = SimplifiedDoc(html)
ChargeItem = doc.select('ChargeItem@SortKey="ICE"')
SupplementaryOffers = ChargeItem.selects('SupplementaryOffer')
xmls = {} # Cache node and OrderId correspondence
for e in SupplementaryOffers:
xmls[int(e.OrderId)] = e.outerHtml
Orderids = [int(id) for id in SupplementaryOffers.select('OrderId()')]
Orderids.sort() # Sort OrderId and replace nodes in this order
i = 0
for e in SupplementaryOffers: # Replace nodes in order
e.repleaceSelf(xmls[Orderids[i]])
i += 1
print(doc.html) # Output sorted XML
结果:
<ChargeItem SortKey="ICE">
<SupplementaryOffer OfferId="80000132" ConnectReason="1" DisconnectReason="0" OrderId="2">A large amount of potato
<MonthAmount ProrateCode="0" BillRecur="71" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">1.00</MonthAmount>
</SupplementaryOffer>
<SupplementaryOffer OfferId="80000000" ConnectReason="1" DisconnectReason="0" OrderId="23">Fishfingers & Custard
<MonthAmount ProrateCode="0" BillRecur="4" FromDate="2019-07-11" ToDate="2019-08-10" Discount="0.00" Qty="1">4.00</MonthAmount>
</SupplementaryOffer>
</ChargeItem>
这里有更多示例,包括解析和更新:https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples