XML 文档中的节点选择

Node Selection in XML Document

我正在处理一份非结构化 XML 文档,以便将其转换为结构化文档。非结构化文档如下所示

<?xml version="1.0" encoding="UTF-8"?>
 <CustomerInformation>
    <CustomerPurchaseID>String</CustomerPurchaseID>
    <MemberAddress>String</MemberAddress>
    <MemberID>String</MemberID>
    <MemberCity>String</MemberCity>
    <MemberName>String</MemberName>
    <MemberType>String</MemberType>
    <MemberState>String</MemberState>
    <MemberSince>String</MemberSince>
    <PurchaseDate>String</PurchaseDate>
    <CreditCardName></CreditCardName>
    <CreditCardExpirration></CreditCardExpirration>
    <Orders>
        <LineItemCode>String</LineItemCode>
        <LineItemID>String</LineItemID>
        <LineItemDescription>String</LineItemDescription>
        <DiscountCode>String</DiscountCode>
    </Orders>
    <Orders>
        <LineItemCode>String</LineItemCode>
        <LineItemID>String</LineItemID>
        <LineItemDescription>String</LineItemDescription>
        <DiscountCode>String</DiscountCode>
    </Orders>
    <ShipToAddress>String</ShipToAddress>
    <ShipToCity>String</ShipToCity>
    <ShipToFirstName>String</ShipToFirstName>
    <ShipToLastName>String</ShipToLastName>
    <ShipToState>String</ShipToState>
    <ShipToZIPCode>String</ShipToZIPCode>
    <CustomerAddressLine1>String</CustomerAddressLine1>
    <CustomerAddressLine2>String</CustomerAddressLine2>
    <CustomerID>String</CustomerID>
    <CustomerCity>String</CustomerCity>
    <CustomerEmail>String</CustomerEmail>
    <CustomerFirstName>String</CustomerFirstName>
    <CustomerLastName>String</CustomerLastName>
    <CustomerHomePhone>String</CustomerHomePhone>
    <CustomerState>String</CustomerState>
    <CustomerZIP>String</CustomerZIP>
    <Status>String</Status>
    <OrderedFromName>String</OrderedFromName>
    <CustomerIdentification></CustomerIdentification>
    <PrimaryCustomerIndicator>String</PrimaryCustomerIndicator>
    <OrderedFromAddressLine1Text>String</OrderedFromAddressLine1Text>
    <OrderedFromAddressLine2Text>String</OrderedFromAddressLine2Text>
    <OrderedFromCityName>String</OrderedFromCityName>
    <OrderedFromStateCode>String</OrderedFromStateCode>
    <OrderedFromZip5Code>String</OrderedFromZip5Code>
    <OrderedFromZip4Code>String</OrderedFromZip4Code>
   </CustomerInformation>

要转换成这样:

<?xml version="1.0" encoding="UTF-8"?>
<xmlns:evt="http://www.metadata..com/Management/">
    <Identifier>3442=000-MNNN</Identifier>
    <TypeCode>Purchase History</TypeCode>
    <TypeDescription>Order Summary</TypeDescription>
    <PurposeCode>Invoice</PurposeCode>
    <Member>
        <Email>String</Email>
        <MemberSince>03/23/2000</MemberSince>
        <MemberType>
            <MemberShipTypeCode>String</MemberShipTypeCode>
            <TypeDescription>String</TypeDescription>
        </MemberType>
        <Address>
            <AddressLine1Text>String</AddressLine1Text>
            <AddressLine2Text>String</AddressLine2Text>
            <CityName>String</CityName>
            <StateCode>String</StateCode>
            <Zip5Code>String</Zip5Code>
            <Zip4Code>String</Zip4Code>
        </Address>
        <Telephone>
            <AreaCode>String</AreaCode>
            <TelephoneNumber>String</TelephoneNumber>
        </Telephone>
    </Member>
    <Company>
        <CompanyName>String</CompanyName>
        <CustomerIdentification>0.0</CustomerIdentification>
        <PrimaryCustomerIndicator>String</PrimaryCustomerIndicator>
        <CompanyAddress>
            <CompanyAddressLine1Text>String</CompanyAddressLine1Text>
            <CompanyAddressLine2Text>String</CompanyAddressLine2Text>
            <CompanyCityName>String</CompanyCityName>
            <CompanyStateCode>String</CompanyStateCode>
            <CompanyZip5Code>String</CompanyZip5Code>
            <CompanyZip4Code>String</CompanyZip4Code>
        </CompanyAddress>
    </Company>
    <Orders>
     <CreditCard>
            <CardName>String</CardName>
            <CardExpirationDate>1967-08-13</CardExpirationDate>
    </CreditCard>
    <Order>
        <Discount>String</Discount>
        <ShippingVendorName>String</ShippingVendorName>
        <ShipmentTrackingNumber>String</ShipmentTrackingNumber>
        <ShipmentTrackingLinkText>String</ShipmentTrackingLinkText>
        <CustomerName>String</CustomerName>
        <CustomerEmailAddressText>String</CustomerEmailAddressText>
        <Telephone>
            <AreaCode>String</AreaCode>
            <TelephoneNumber>String</TelephoneNumber>
        </Telephone>
        <ShippingAddress>
            <ShippingAddressLine1Text>String</ShippingAddressLine1Text>
            <ShippingAddressLine2Text>String</ShippingAddressLine2Text>
            <ShippingCareOfText>String</ShippingCareOfText>
            <ShippingCityName>String</ShippingCityName>
            <ShippingStateCode>String</ShippingStateCode>
            <ShippingZip5Code>String</ShippingZip5Code>
            <ShippingZip4Code>String</ShippingZip4Code>
        </ShippingAddress>
        <LineItem>
            <LineItemNumber>String</LineItemNumber>
            <LineItemQuantityCount>0</LineItemQuantityCount>
            <ItemOrderedIndicator>String</ItemOrderedIndicator>
            <Discount>String</Discount>
        </LineItem>
    </Order>
    </Orders>

我能够通过创建结构化格式并通过简单地使用节点值和下面的 XSLT 提取相关字段来生成 XML:

<xsl:value-of select=.../>

不过我觉得可能有更好的方法。我希望能够在浏览非结构化或平面文档时控制结构的生成方式。例如,有没有办法对所有 MemberAddress 字段的元素进行分组?如果我能够做到这一点,我就可以创建输出的成员部分。我也可以对其他元素做同样的事情。我担心对结构化文档进行硬编码是因为它将来可能会发生变化。如果可能的话,我希望能够控制输出。源文档中的所有成员信息都应映射到目标文档中的成员元素。源文档中以 OrderedFrom 开头的元素应映射到目标文档中的 Company 字段。 ShipTo 元素又应该映射到目标文档的订单部分中的运输信息,等等。请帮忙!!

My concern with hardcoding the structured document is that it may change in the future.

XSLT 样式表将数据从一种 XML 模式转换为另一种模式。期望任一架构的更改都不需要重写样式表是不现实的。

Is there a way to group the elements for all MemberAddress fields for example?

是的,如果你有办法识别它们的话。例如,您可以这样做:

<Member>
    <xsl:for-each select="*[starts-with(name(), 'Member')]">
        <xsl:element name="{substring-after(name(), 'Member')}">
            <xsl:value-of select="." />
        </xsl:element>
    </xsl:for-each>
</Member>

得到:

<Member>
    <Address>String</Address>
    <ID>String</ID>
    <City>String</City>
    <Name>String</Name>
    <Type>String</Type>
    <State>String</State>
    <Since>String</Since>
</Member>

但这不符合您的预期输出。顺便说一句,您的输出显示了很多输入中没有的数据,例如成员的 e-mail.