在 python xmltodict、elementTree 等中将一个 XML 转换为另一个 XML 文件的最有效方法
Most efficient way to convert one XML to a different XML file in python xmltodict, elementTree etc
你好,
所以我有以下两个 XML 文件。
文件A:
<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
<Shipments>
<Shipment>
<Container>
<OrderNumber>5108046</OrderNumber>
<ContainerNumber>5108046_1</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-12T12:00:00</ShipDate>
<CarrierName>UPS</CarrierName>
<TrackingNumber>1ZX20520A803682850</TrackingNumber>
<StatusCode>InTransit</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-06-29T13:53:18</TimeStamp>
<City></City>
<StateOrProvince></StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
<TrackingEvent>
<TimeStamp>2015-06-29T18:47:44</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>Status: AF Recorded</Description>
<TrackingStatus>In Transit</TrackingStatus>
</TrackingEvent>
</Events>
</Container>
</Shipment>
<Shipment>
<Container>
<OrderNumber>456789</OrderNumber>
<ContainerNumber>44789</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-03T13:56:27</ShipDate>
<CarrierName>UP2</CarrierName>
<TrackingNumber>1Z4561230020</TrackingNumber>
<StatusCode>IN_TRANSIT</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-07-03T13:56:27</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
</Events>
</Container>
</Shipment>
</Shipments>
<MatchingRecords>2</MatchingRecords>
<RequestId></RequestId>
<RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>
文件 B:
<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error></error>
</outcome>
<shipments>
<shipment>
<orderID>123456</orderID>
<containerNo>CD1863663C</containerNo>
<shipDate>2015-06-29T18:47:44</shipDate>
<carrier>UPS</carrier>
<trackingNumber>1Z4561230001</trackingNumber>
<statusCode>IN_TRANSIT</statusCode>
<statusMessage>In Transit</statusMessage>
<shipmentEvents>
<trackingUpdate>
<timeStamp>2015-06-29T13:53:18</timeStamp>
<city />
<state />
<trackingMessage>Manifest</trackingMessage>
</trackingUpdate>
<trackingUpdate>
<timeStamp>2015-06-29T18:47:44</timeStamp>
<city>Glenwillow</city>
<state>OH</state>
<trackingMessage>Shipped from warehouse</trackingMessage>
</trackingUpdate>
</shipmentEvents>
</shipment>
<shipment>
<orderID>456789</orderID>
<containerNo>44789</containerNo>
<shipDate>2015-07-03T13:56:27</shipDate>
<carrier>UP2</carrier>
<trackingNumber>1Z4561230020</trackingNumber>
<statusCode>IN_TRANSIT</statusCode>
<statusMessage>In Transit</statusMessage>
<shipmentEvents>
<trackingUpdate>
<timeStamp>2015-07-03T13:56:27</timeStamp>
<city>Glenwillow</city>
<state>OH</state>
<trackingMessage>Manifest</trackingMessage>
</trackingUpdate>
</shipmentEvents>
</shipment>
</shipments>
<matchingRecords>2</matchingRecords>
<requestId></requestId>
<remainingRecords>0</remainingRecords>
</getShipmentStatusResult>
</getShipmentStatusResponse>
我基本上需要通读文件 A 并将其更改为看起来像文件 B。现在,我一直在使用 xmltodic 来解析文件 A,但它只会读取顶部元素。看来我必须创建多个 for 循环才能使用 xmltodict 实现此目的。遍历每个父元素然后遍历子元素的循环。
查看elementree,这似乎是一样的。有谁知道无需执行多个 for 循环的其他方法吗?
由于您的输出或多或少是输入的精确映射 - 只有元素名称似乎有所不同,我建议您使用 XSLT 以声明方式进行转换。
假设每个输入元素名称都无条件地映射到一个输出元素名称(根据您的样本判断,这就是它的样子):这是一个让您入门的 XSLT 1.0 转换(关于如何在中使用 XSLT 的基本说明Python可以找到in this answer):
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="http://tempuri.org/config"
exclude-result-prefixes="my"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*" />
<my:config>
<nameMap from="Shipments" to="shipments" />
<nameMap from="Shipment" to="shipment" />
<nameMap from="Container" to="-" />
</my:config>
<xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />
<xsl:template match="node() | @*" name="identity">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<getShipmentStatusResponse>
<xsl:apply-templates select="@* | node()" />
</getShipmentStatusResponse>
</xsl:template>
<xsl:template match="GetShipmentUpdatesResult">
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error></error>
</outcome>
<xsl:apply-templates select="@* | node()" />
</getShipmentStatusResult>
</xsl:template>
<xsl:template match="*">
<xsl:variable name="map" select="$nameMap[@from = name(current())]" />
<xsl:choose>
<xsl:when test="$map/@to = '-'">
<xsl:apply-templates select="@* | node()" />
</xsl:when>
<xsl:when test="$map/@to != ''">
<xsl:element name="{$map/@to}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:when>
<xsl:when test="$map/@to = ''" />
<xsl:otherwise>
<xsl:call-template name="identity" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:transform>
转换解决问题如下:
- 其核心是身份转换:任何与专用模板不匹配的节点都将按原样复制到输出中。
它包含一个就地配置部分 (<my:config>
),您可以在其中放置 <nameMap>
元素以将输入名称映射到输出名称。这通过以下约定(在 <xsl:template match="*">
下面的几行中实现):
- 如果一个输入元素匹配任何@from并且填充了@to,该元素将被重命名并处理它的子元素
- 如果输入元素与任何@from 匹配并且@to 是
'-'
,该元素将被删除,但其子元素仍将被处理。
- 如果输入元素与任何@from 匹配并且@to 为空,它将完全从输出中删除
- 在所有其他情况下,将通过标识模板复制输入元素 1:1。
当前输出如下所示。添加更多 <nameMap>
规则来定义其余输入元素的行为。
<getShipmentStatusResponse>
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error />
</outcome>
<shipments>
<shipment>
<OrderNumber>5108046</OrderNumber>
<ContainerNumber>5108046_1</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-12T12:00:00</ShipDate>
<CarrierName>UPS</CarrierName>
<TrackingNumber>1ZX20520A803682850</TrackingNumber>
<StatusCode>InTransit</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-06-29T13:53:18</TimeStamp>
<City />
<StateOrProvince />
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
<TrackingEvent>
<TimeStamp>2015-06-29T18:47:44</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>Status: AF Recorded</Description>
<TrackingStatus>In Transit</TrackingStatus>
</TrackingEvent>
</Events>
</shipment>
<shipment>
<OrderNumber>456789</OrderNumber>
<ContainerNumber>44789</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-03T13:56:27</ShipDate>
<CarrierName>UP2</CarrierName>
<TrackingNumber>1Z4561230020</TrackingNumber>
<StatusCode>IN_TRANSIT</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-07-03T13:56:27</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
</Events>
</shipment>
</shipments>
<MatchingRecords>2</MatchingRecords>
<RequestId />
<RecordsRemaining>0</RecordsRemaining>
</getShipmentStatusResult>
</getShipmentStatusResponse>
你好,
所以我有以下两个 XML 文件。
文件A:
<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
<Shipments>
<Shipment>
<Container>
<OrderNumber>5108046</OrderNumber>
<ContainerNumber>5108046_1</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-12T12:00:00</ShipDate>
<CarrierName>UPS</CarrierName>
<TrackingNumber>1ZX20520A803682850</TrackingNumber>
<StatusCode>InTransit</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-06-29T13:53:18</TimeStamp>
<City></City>
<StateOrProvince></StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
<TrackingEvent>
<TimeStamp>2015-06-29T18:47:44</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>Status: AF Recorded</Description>
<TrackingStatus>In Transit</TrackingStatus>
</TrackingEvent>
</Events>
</Container>
</Shipment>
<Shipment>
<Container>
<OrderNumber>456789</OrderNumber>
<ContainerNumber>44789</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-03T13:56:27</ShipDate>
<CarrierName>UP2</CarrierName>
<TrackingNumber>1Z4561230020</TrackingNumber>
<StatusCode>IN_TRANSIT</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-07-03T13:56:27</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
</Events>
</Container>
</Shipment>
</Shipments>
<MatchingRecords>2</MatchingRecords>
<RequestId></RequestId>
<RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>
文件 B:
<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error></error>
</outcome>
<shipments>
<shipment>
<orderID>123456</orderID>
<containerNo>CD1863663C</containerNo>
<shipDate>2015-06-29T18:47:44</shipDate>
<carrier>UPS</carrier>
<trackingNumber>1Z4561230001</trackingNumber>
<statusCode>IN_TRANSIT</statusCode>
<statusMessage>In Transit</statusMessage>
<shipmentEvents>
<trackingUpdate>
<timeStamp>2015-06-29T13:53:18</timeStamp>
<city />
<state />
<trackingMessage>Manifest</trackingMessage>
</trackingUpdate>
<trackingUpdate>
<timeStamp>2015-06-29T18:47:44</timeStamp>
<city>Glenwillow</city>
<state>OH</state>
<trackingMessage>Shipped from warehouse</trackingMessage>
</trackingUpdate>
</shipmentEvents>
</shipment>
<shipment>
<orderID>456789</orderID>
<containerNo>44789</containerNo>
<shipDate>2015-07-03T13:56:27</shipDate>
<carrier>UP2</carrier>
<trackingNumber>1Z4561230020</trackingNumber>
<statusCode>IN_TRANSIT</statusCode>
<statusMessage>In Transit</statusMessage>
<shipmentEvents>
<trackingUpdate>
<timeStamp>2015-07-03T13:56:27</timeStamp>
<city>Glenwillow</city>
<state>OH</state>
<trackingMessage>Manifest</trackingMessage>
</trackingUpdate>
</shipmentEvents>
</shipment>
</shipments>
<matchingRecords>2</matchingRecords>
<requestId></requestId>
<remainingRecords>0</remainingRecords>
</getShipmentStatusResult>
</getShipmentStatusResponse>
我基本上需要通读文件 A 并将其更改为看起来像文件 B。现在,我一直在使用 xmltodic 来解析文件 A,但它只会读取顶部元素。看来我必须创建多个 for 循环才能使用 xmltodict 实现此目的。遍历每个父元素然后遍历子元素的循环。
查看elementree,这似乎是一样的。有谁知道无需执行多个 for 循环的其他方法吗?
由于您的输出或多或少是输入的精确映射 - 只有元素名称似乎有所不同,我建议您使用 XSLT 以声明方式进行转换。
假设每个输入元素名称都无条件地映射到一个输出元素名称(根据您的样本判断,这就是它的样子):这是一个让您入门的 XSLT 1.0 转换(关于如何在中使用 XSLT 的基本说明Python可以找到in this answer):
<xsl:transform version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="http://tempuri.org/config"
exclude-result-prefixes="my"
>
<xsl:output method="xml" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*" />
<my:config>
<nameMap from="Shipments" to="shipments" />
<nameMap from="Shipment" to="shipment" />
<nameMap from="Container" to="-" />
</my:config>
<xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />
<xsl:template match="node() | @*" name="identity">
<xsl:copy>
<xsl:apply-templates select="@* | node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="/">
<getShipmentStatusResponse>
<xsl:apply-templates select="@* | node()" />
</getShipmentStatusResponse>
</xsl:template>
<xsl:template match="GetShipmentUpdatesResult">
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error></error>
</outcome>
<xsl:apply-templates select="@* | node()" />
</getShipmentStatusResult>
</xsl:template>
<xsl:template match="*">
<xsl:variable name="map" select="$nameMap[@from = name(current())]" />
<xsl:choose>
<xsl:when test="$map/@to = '-'">
<xsl:apply-templates select="@* | node()" />
</xsl:when>
<xsl:when test="$map/@to != ''">
<xsl:element name="{$map/@to}">
<xsl:apply-templates select="@* | node()" />
</xsl:element>
</xsl:when>
<xsl:when test="$map/@to = ''" />
<xsl:otherwise>
<xsl:call-template name="identity" />
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:transform>
转换解决问题如下:
- 其核心是身份转换:任何与专用模板不匹配的节点都将按原样复制到输出中。
它包含一个就地配置部分 (
<my:config>
),您可以在其中放置<nameMap>
元素以将输入名称映射到输出名称。这通过以下约定(在<xsl:template match="*">
下面的几行中实现):- 如果一个输入元素匹配任何@from并且填充了@to,该元素将被重命名并处理它的子元素
- 如果输入元素与任何@from 匹配并且@to 是
'-'
,该元素将被删除,但其子元素仍将被处理。 - 如果输入元素与任何@from 匹配并且@to 为空,它将完全从输出中删除
- 在所有其他情况下,将通过标识模板复制输入元素 1:1。
当前输出如下所示。添加更多 <nameMap>
规则来定义其余输入元素的行为。
<getShipmentStatusResponse>
<getShipmentStatusResult>
<outcome>
<result>Success</result>
<error />
</outcome>
<shipments>
<shipment>
<OrderNumber>5108046</OrderNumber>
<ContainerNumber>5108046_1</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-12T12:00:00</ShipDate>
<CarrierName>UPS</CarrierName>
<TrackingNumber>1ZX20520A803682850</TrackingNumber>
<StatusCode>InTransit</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-06-29T13:53:18</TimeStamp>
<City />
<StateOrProvince />
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
<TrackingEvent>
<TimeStamp>2015-06-29T18:47:44</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>Status: AF Recorded</Description>
<TrackingStatus>In Transit</TrackingStatus>
</TrackingEvent>
</Events>
</shipment>
<shipment>
<OrderNumber>456789</OrderNumber>
<ContainerNumber>44789</ContainerNumber>
<CustomerOrderNumber>abcq123</CustomerOrderNumber>
<ShipDate>2015-07-03T13:56:27</ShipDate>
<CarrierName>UP2</CarrierName>
<TrackingNumber>1Z4561230020</TrackingNumber>
<StatusCode>IN_TRANSIT</StatusCode>
<Events>
<TrackingEvent>
<TimeStamp>2015-07-03T13:56:27</TimeStamp>
<City>Glenwillow</City>
<StateOrProvince>OH</StateOrProvince>
<Description>manifested from Warehouse</Description>
<TrackingStatus>Manifest</TrackingStatus>
</TrackingEvent>
</Events>
</shipment>
</shipments>
<MatchingRecords>2</MatchingRecords>
<RequestId />
<RecordsRemaining>0</RecordsRemaining>
</getShipmentStatusResult>
</getShipmentStatusResponse>