在 python xmltodict、elementTree 等中将一个 XML 转换为另一个 XML 文件的最有效方法

Most efficient way to convert one XML to a different XML file in python xmltodict, elementTree etc

你好,

所以我有以下两个 XML 文件。

文件A:

<?xml version="1.0" encoding="UTF-8"?>
<GetShipmentUpdatesResult>
    <Shipments>
        <Shipment>
            <Container>
                <OrderNumber>5108046</OrderNumber>
                <ContainerNumber>5108046_1</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-12T12:00:00</ShipDate>
                <CarrierName>UPS</CarrierName>
                <TrackingNumber>1ZX20520A803682850</TrackingNumber>
                <StatusCode>InTransit</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T13:53:18</TimeStamp>
                        <City></City>
                        <StateOrProvince></StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                    <TrackingEvent>
                        <TimeStamp>2015-06-29T18:47:44</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>Status: AF Recorded</Description>
                        <TrackingStatus>In Transit</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
        <Shipment>
            <Container>
                <OrderNumber>456789</OrderNumber>
                <ContainerNumber>44789</ContainerNumber>
                <CustomerOrderNumber>abcq123</CustomerOrderNumber>
                <ShipDate>2015-07-03T13:56:27</ShipDate>
                <CarrierName>UP2</CarrierName>
                <TrackingNumber>1Z4561230020</TrackingNumber>
                <StatusCode>IN_TRANSIT</StatusCode>
                <Events>
                    <TrackingEvent>
                        <TimeStamp>2015-07-03T13:56:27</TimeStamp>
                        <City>Glenwillow</City>
                        <StateOrProvince>OH</StateOrProvince>
                        <Description>manifested from Warehouse</Description>
                        <TrackingStatus>Manifest</TrackingStatus>
                    </TrackingEvent>
                </Events>
            </Container>
        </Shipment>
    </Shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId></RequestId>
    <RecordsRemaining>0</RecordsRemaining>
</GetShipmentUpdatesResult>

文件 B:

<?xml version="1.0" encoding="UTF-8"?>
<getShipmentStatusResponse>
    <getShipmentStatusResult>
        <outcome>
            <result>Success</result>
            <error></error>
        </outcome>
        <shipments>
            <shipment>
                <orderID>123456</orderID>
                <containerNo>CD1863663C</containerNo>
                <shipDate>2015-06-29T18:47:44</shipDate>
                <carrier>UPS</carrier>
                <trackingNumber>1Z4561230001</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T13:53:18</timeStamp>
                        <city />
                        <state />
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                    <trackingUpdate>
                        <timeStamp>2015-06-29T18:47:44</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Shipped from warehouse</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
            <shipment>
                <orderID>456789</orderID>
                <containerNo>44789</containerNo>
                <shipDate>2015-07-03T13:56:27</shipDate>
                <carrier>UP2</carrier>
                <trackingNumber>1Z4561230020</trackingNumber>
                <statusCode>IN_TRANSIT</statusCode>
                <statusMessage>In Transit</statusMessage>
                <shipmentEvents>
                    <trackingUpdate>
                        <timeStamp>2015-07-03T13:56:27</timeStamp>
                        <city>Glenwillow</city>
                        <state>OH</state>
                        <trackingMessage>Manifest</trackingMessage>
                    </trackingUpdate>
                </shipmentEvents>
            </shipment>
        </shipments>
        <matchingRecords>2</matchingRecords>
        <requestId></requestId>
        <remainingRecords>0</remainingRecords>
    </getShipmentStatusResult>
</getShipmentStatusResponse>

我基本上需要通读文件 A 并将其更改为看起来像文件 B。现在,我一直在使用 xmltodic 来解析文件 A,但它只会读取顶部元素。看来我必须创建多个 for 循环才能使用 xmltodict 实现此目的。遍历每个父元素然后遍历子元素的循环。

查看elementree,这似乎是一样的。有谁知道无需执行多个 for 循环的其他方法吗?

由于您的输出或多或少是输入的精确映射 - 只有元素名称似乎有所不同,我建议您使用 XSLT 以声明方式进行转换。

假设每个输入元素名称都无条件地映射到一个输出元素名称(根据您的样本判断,这就是它的样子):这是一个让您入门的 XSLT 1.0 转换(关于如何在中使用 XSLT 的基本说明Python可以找到in this answer):

<xsl:transform version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:my="http://tempuri.org/config"
  exclude-result-prefixes="my"
>
  <xsl:output method="xml" encoding="UTF-8" indent="yes" />
  <xsl:strip-space elements="*" />

  <my:config>
    <nameMap from="Shipments" to="shipments" />
    <nameMap from="Shipment" to="shipment" />
    <nameMap from="Container" to="-" />
  </my:config>
  <xsl:variable name="nameMap" select="document('')/*/my:config/nameMap" />

  <xsl:template match="node() | @*" name="identity">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/">
    <getShipmentStatusResponse>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResponse>
  </xsl:template>

  <xsl:template match="GetShipmentUpdatesResult">
    <getShipmentStatusResult>
      <outcome>
        <result>Success</result>
        <error></error>
      </outcome>
      <xsl:apply-templates select="@* | node()" />
    </getShipmentStatusResult>
  </xsl:template>

  <xsl:template match="*">
    <xsl:variable name="map" select="$nameMap[@from = name(current())]" />
    <xsl:choose>
      <xsl:when test="$map/@to = '-'">
        <xsl:apply-templates select="@* | node()" />
      </xsl:when>
      <xsl:when test="$map/@to != ''">
        <xsl:element name="{$map/@to}">
          <xsl:apply-templates select="@* | node()" />
        </xsl:element>
      </xsl:when>
      <xsl:when test="$map/@to = ''" />
      <xsl:otherwise>
        <xsl:call-template name="identity" />
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:transform>

转换解决问题如下:

  • 其核心是身份转换:任何与专用模板不匹配的节点都将按原样复制到输出中。
  • 它包含一个就地配置部分 (<my:config>),您可以在其中放置 <nameMap> 元素以将输入名称映射到输出名称。这通过以下约定(在 <xsl:template match="*"> 下面的几行中实现):

    • 如果一个输入元素匹配任何@from并且填充了@to,该元素将被重命名并处理它的子元素
    • 如果输入元素与任何@from 匹配并且@to 是'-',该元素将被删除,但其子元素仍将被处理。
    • 如果输入元素与任何@from 匹配并且@to 为空,它将完全从输出中删除
    • 在所有其他情况下,将通过标识模板复制输入元素 1:1。

当前输出如下所示。添加更多 <nameMap> 规则来定义其余输入元素的行为。

<getShipmentStatusResponse>
  <getShipmentStatusResult>
    <outcome>
      <result>Success</result>
      <error />
    </outcome>
    <shipments>
      <shipment>
        <OrderNumber>5108046</OrderNumber>
        <ContainerNumber>5108046_1</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-12T12:00:00</ShipDate>
        <CarrierName>UPS</CarrierName>
        <TrackingNumber>1ZX20520A803682850</TrackingNumber>
        <StatusCode>InTransit</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-06-29T13:53:18</TimeStamp>
            <City />
            <StateOrProvince />
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
          <TrackingEvent>
            <TimeStamp>2015-06-29T18:47:44</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>Status: AF Recorded</Description>
            <TrackingStatus>In Transit</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
      <shipment>
        <OrderNumber>456789</OrderNumber>
        <ContainerNumber>44789</ContainerNumber>
        <CustomerOrderNumber>abcq123</CustomerOrderNumber>
        <ShipDate>2015-07-03T13:56:27</ShipDate>
        <CarrierName>UP2</CarrierName>
        <TrackingNumber>1Z4561230020</TrackingNumber>
        <StatusCode>IN_TRANSIT</StatusCode>
        <Events>
          <TrackingEvent>
            <TimeStamp>2015-07-03T13:56:27</TimeStamp>
            <City>Glenwillow</City>
            <StateOrProvince>OH</StateOrProvince>
            <Description>manifested from Warehouse</Description>
            <TrackingStatus>Manifest</TrackingStatus>
          </TrackingEvent>
        </Events>
      </shipment>
    </shipments>
    <MatchingRecords>2</MatchingRecords>
    <RequestId />
    <RecordsRemaining>0</RecordsRemaining>
  </getShipmentStatusResult>
</getShipmentStatusResponse>