SQL Server 2012 XML 展平并减少扇出/重复

SQL Server 2012 XML Flatten and reduce fan out / duplicates

我在尝试获取一些 XML 数据(在旧 MS SQL Server 2012 中作为文本存储)解析并转换为可用格式时遇到了一些问题。

XML数据是一个字符串,但是当我把它转换成XML时,它看起来像这样:

<?xml version="1.0" encoding="utf-8"?>
  <header1>
    <header2>
      <OrderFormHeader>
        <AccountNum>123456</AccountNum>
        <OrderNum>000123987</OrderNum>
        <OrderDetails>
          <CompanyName>Biznez1</CompanyName>
          <CompAddressInfo>
            <City>Phoenix</City>
            <State>AZ</State>
          </CompAddressInfo>
          <ShipTo>TRUE</ShipTo>
          <BillTo>FALSE</BillTo>
        </OrderDetails>
      </OrderFormHeader>
      <OrderFormDetails>
        <OrderFormLines>
          <ItemNum>000001</ItemNum>
          <InventoryNum>INV-001-000001</InventoryNum>
          <OtherDetails>
            <QtyOrdered>1</QtyOrdered>
            <ItemDesc>Bandaids</ItemDesc>
            <UnitofMeasure>Box</UnitofMeasure>
            <ItemCode>
              <CodeType>UPC</CodeType>
              <CodeID>123456789123</CodeID>
            </ItemCode>
          <OtherDetails>
        </OrderFormLines>
      </OrderFormDetails>
        <OrderFormLines>
          <ItemNum>000002</ItemNum>
          <InventoryNum>INV-001-000002</InventoryNum>
          <OtherDetails>
            <QtyOrdered>1</QtyOrdered>
            <ItemDesc>QTips</ItemDesc>
            <UnitofMeasure>Box</UnitofMeasure>
            <ItemCode>
              <CodeType>UPC</CodeType>
              <CodeID>123456789987</CodeID>
            </ItemCode>
          <OtherDetails>
        </OrderFormLines>
        <OrderFormLines>
          <ItemNum>000003</ItemNum>
          <InventoryNum>INV-003-000001</InventoryNum>
          <OtherDetails>
            <QtyOrdered>1</QtyOrdered>
            <ItemDesc>Scissors</ItemDesc>
            <UnitofMeasure>Each</UnitofMeasure>
            <ItemCode>
              <CodeType>UPC</CodeType>
              <CodeID>123456987321</CodeID>
            </ItemCode>
          <OtherDetails>
        </OrderFormLines>
      </header2>
    </header1>

不用说,这太疯狂了 XML(至少对我而言)。 (注意:对象中嵌套了多组 OrderFormDetails,通过我的代码解析它们似乎在 ItemNum 和 InventoryNum 上散开。我删除了 UPC 代码内容,因为那会导致额外的散开,但不介意将其带回我的代码中)

话虽如此,我当前的 SQL 代码使用 table 变量从 table 中获取数据,更正 UTF-8 并将其放入 XML 格式。从那里,我使用 CROSS APPLY 函数来获取数据,但它有严重的扇出问题,它会多次显示数据,而不是每次只显示 1 行:

DECLARE @xml TABLE (IMPORTED_XML xml)
INSERT INTO @xml

SELECT
  CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS XML) AS IMPORTED_XML
FROM MyXMLTable as mxt

with temp1 AS (
  SELECT DISTINCT
      sales_order.value('(./AccountNum/text())[1]','nvarchar(max)')                                                                       AS ACCOUNT_NUM
    , sales_order.value('(./OrderNum/text())[1]','nvarchar(max)')                                                              AS ORDER_NUM
    , extra_so.value('(./CompanyName/text())[1]','nvarchar(max)')                                                      AS COMPANY_NAME
    , base.value('(./ItemNum/text())[1]','nvarchar(max)')                                                                                      AS ITEM_ID
    , base.value('(./InventoryNum/text())[1]','nvarchar(max)')                                                                                 AS INVENTORY_NUM
    , sales.value('(./QtyOrdered/text())[1]','nvarchar(max)')                                                                                AS QTY_ORDERED
    , sales.value('(./UnitofMeasure/text())[1]','nvarchar(max)')                                                                                     AS ITEM_UOM
    , sales.value('(./ItemDesc/text())[1]','nvarchar(max)')                                                                            AS ITEM_DESC

FROM @xml
  CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
  CROSS APPLY core.nodes('//OrderFormDetails/OrderFormLines') as base(base)
  CROSS APPLY core.nodes('//OrderFormHeader') AS sales_order(sales_order)
  CROSS APPLY base.nodes('//OtherDetails') as sales(sales)
  CROSS APPLY sales_order.nodes('//OrderDetails') AS extra_so(extra_so)
  CROSS APPLY sales.nodes('//ItemCode') as itmcode(itmcode)
)

select * from temp1 order by item_desc asc

这似乎主要 有效,但它最终会为相同的内容生成多行数据...我习惯使用 lateral flatten Snowflake 中的函数,但在 SQL Server 2012 中不是这个 XML 解析。对此有何见解?预先感谢您的帮助

你的问题是你从根开始一直交叉连接每个嵌套节点,因为你正在使用 //.

还有其他需要注意的地方:

  • 您不需要临时表,您可以CROSS APPLY在一个查询中将所有内容放在一起
  • 如果该列已经是 varchar,则不需要 REPLACE,仅当它是 nvarchar
  • 您不需要在每一层嵌套上都使用 .nodes,只有当您需要来自一个层的多个项目时才需要它。
  • 仔细选择你的数据类型,所有的东西nvarchar(max)吗?
SELECT 
    sales_order.value('(AccountNum/text())[1]','varchar(50)') AS ACCOUNT_NUM
  , sales_order.value('(OrderNum/text())[1]','varchar(50)') AS ORDER_NUM
  , sales_order.value('(OrderDetails/CompanyName/text())[1]','nvarchar(200)') AS COMPANY_NAME
  , base.value('(ItemNum/text())[1]','varchar(50)')  AS ITEM_ID
  , base.value('(InventoryNum/text())[1]','varchar(50)') AS INVENTORY_NUM
  , sales.value('(QtyOrdered/text())[1]','int')  AS QTY_ORDERED
  , sales.value('(UnitofMeasure/text())[1]','varchar(20)') AS ITEM_UOM
  , sales.value('(ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
  , itmcode.value('(CodeType/text())[1]','varchar(20)') AS itemcodetype
  , itmcode.value('(CodeID/text())[1]','varchar(50)') AS itemcodeID
      

FROM MyXMLTable as mxt
CROSS APPLY (VALUES( CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS xml) )) v(IMPORTED_XML)
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY core.nodes('OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY base.nodes('OtherDetails') as sales(sales)
CROSS APPLY sales.nodes('ItemCode') as itmcode(itmcode);

db<>fiddle