SQL Server 2012 XML 展平并减少扇出/重复
SQL Server 2012 XML Flatten and reduce fan out / duplicates
我在尝试获取一些 XML 数据(在旧 MS SQL Server 2012 中作为文本存储)解析并转换为可用格式时遇到了一些问题。
XML数据是一个字符串,但是当我把它转换成XML时,它看起来像这样:
<?xml version="1.0" encoding="utf-8"?>
<header1>
<header2>
<OrderFormHeader>
<AccountNum>123456</AccountNum>
<OrderNum>000123987</OrderNum>
<OrderDetails>
<CompanyName>Biznez1</CompanyName>
<CompAddressInfo>
<City>Phoenix</City>
<State>AZ</State>
</CompAddressInfo>
<ShipTo>TRUE</ShipTo>
<BillTo>FALSE</BillTo>
</OrderDetails>
</OrderFormHeader>
<OrderFormDetails>
<OrderFormLines>
<ItemNum>000001</ItemNum>
<InventoryNum>INV-001-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Bandaids</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789123</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</OrderFormDetails>
<OrderFormLines>
<ItemNum>000002</ItemNum>
<InventoryNum>INV-001-000002</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>QTips</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789987</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
<OrderFormLines>
<ItemNum>000003</ItemNum>
<InventoryNum>INV-003-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Scissors</ItemDesc>
<UnitofMeasure>Each</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456987321</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</header2>
</header1>
不用说,这太疯狂了 XML(至少对我而言)。
(注意:对象中嵌套了多组 OrderFormDetails,通过我的代码解析它们似乎在 ItemNum 和 InventoryNum 上散开。我删除了 UPC 代码内容,因为那会导致额外的散开,但不介意将其带回我的代码中)
话虽如此,我当前的 SQL 代码使用 table 变量从 table 中获取数据,更正 UTF-8 并将其放入 XML 格式。从那里,我使用 CROSS APPLY 函数来获取数据,但它有严重的扇出问题,它会多次显示数据,而不是每次只显示 1 行:
DECLARE @xml TABLE (IMPORTED_XML xml)
INSERT INTO @xml
SELECT
CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS XML) AS IMPORTED_XML
FROM MyXMLTable as mxt
with temp1 AS (
SELECT DISTINCT
sales_order.value('(./AccountNum/text())[1]','nvarchar(max)') AS ACCOUNT_NUM
, sales_order.value('(./OrderNum/text())[1]','nvarchar(max)') AS ORDER_NUM
, extra_so.value('(./CompanyName/text())[1]','nvarchar(max)') AS COMPANY_NAME
, base.value('(./ItemNum/text())[1]','nvarchar(max)') AS ITEM_ID
, base.value('(./InventoryNum/text())[1]','nvarchar(max)') AS INVENTORY_NUM
, sales.value('(./QtyOrdered/text())[1]','nvarchar(max)') AS QTY_ORDERED
, sales.value('(./UnitofMeasure/text())[1]','nvarchar(max)') AS ITEM_UOM
, sales.value('(./ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
FROM @xml
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('//OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY core.nodes('//OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY base.nodes('//OtherDetails') as sales(sales)
CROSS APPLY sales_order.nodes('//OrderDetails') AS extra_so(extra_so)
CROSS APPLY sales.nodes('//ItemCode') as itmcode(itmcode)
)
select * from temp1 order by item_desc asc
这似乎主要 有效,但它最终会为相同的内容生成多行数据...我习惯使用 lateral flatten Snowflake 中的函数,但在 SQL Server 2012 中不是这个 XML 解析。对此有何见解?预先感谢您的帮助
你的问题是你从根开始一直交叉连接每个嵌套节点,因为你正在使用 //
.
还有其他需要注意的地方:
- 您不需要临时表,您可以
CROSS APPLY
在一个查询中将所有内容放在一起
- 如果该列已经是
varchar
,则不需要 REPLACE
,仅当它是 nvarchar
。
- 您不需要在每一层嵌套上都使用
.nodes
,只有当您需要来自一个层的多个项目时才需要它。
- 仔细选择你的数据类型,所有的东西都是
nvarchar(max)
吗?
SELECT
sales_order.value('(AccountNum/text())[1]','varchar(50)') AS ACCOUNT_NUM
, sales_order.value('(OrderNum/text())[1]','varchar(50)') AS ORDER_NUM
, sales_order.value('(OrderDetails/CompanyName/text())[1]','nvarchar(200)') AS COMPANY_NAME
, base.value('(ItemNum/text())[1]','varchar(50)') AS ITEM_ID
, base.value('(InventoryNum/text())[1]','varchar(50)') AS INVENTORY_NUM
, sales.value('(QtyOrdered/text())[1]','int') AS QTY_ORDERED
, sales.value('(UnitofMeasure/text())[1]','varchar(20)') AS ITEM_UOM
, sales.value('(ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
, itmcode.value('(CodeType/text())[1]','varchar(20)') AS itemcodetype
, itmcode.value('(CodeID/text())[1]','varchar(50)') AS itemcodeID
FROM MyXMLTable as mxt
CROSS APPLY (VALUES( CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS xml) )) v(IMPORTED_XML)
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY core.nodes('OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY base.nodes('OtherDetails') as sales(sales)
CROSS APPLY sales.nodes('ItemCode') as itmcode(itmcode);
我在尝试获取一些 XML 数据(在旧 MS SQL Server 2012 中作为文本存储)解析并转换为可用格式时遇到了一些问题。
XML数据是一个字符串,但是当我把它转换成XML时,它看起来像这样:
<?xml version="1.0" encoding="utf-8"?>
<header1>
<header2>
<OrderFormHeader>
<AccountNum>123456</AccountNum>
<OrderNum>000123987</OrderNum>
<OrderDetails>
<CompanyName>Biznez1</CompanyName>
<CompAddressInfo>
<City>Phoenix</City>
<State>AZ</State>
</CompAddressInfo>
<ShipTo>TRUE</ShipTo>
<BillTo>FALSE</BillTo>
</OrderDetails>
</OrderFormHeader>
<OrderFormDetails>
<OrderFormLines>
<ItemNum>000001</ItemNum>
<InventoryNum>INV-001-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Bandaids</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789123</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</OrderFormDetails>
<OrderFormLines>
<ItemNum>000002</ItemNum>
<InventoryNum>INV-001-000002</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>QTips</ItemDesc>
<UnitofMeasure>Box</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456789987</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
<OrderFormLines>
<ItemNum>000003</ItemNum>
<InventoryNum>INV-003-000001</InventoryNum>
<OtherDetails>
<QtyOrdered>1</QtyOrdered>
<ItemDesc>Scissors</ItemDesc>
<UnitofMeasure>Each</UnitofMeasure>
<ItemCode>
<CodeType>UPC</CodeType>
<CodeID>123456987321</CodeID>
</ItemCode>
<OtherDetails>
</OrderFormLines>
</header2>
</header1>
不用说,这太疯狂了 XML(至少对我而言)。 (注意:对象中嵌套了多组 OrderFormDetails,通过我的代码解析它们似乎在 ItemNum 和 InventoryNum 上散开。我删除了 UPC 代码内容,因为那会导致额外的散开,但不介意将其带回我的代码中)
话虽如此,我当前的 SQL 代码使用 table 变量从 table 中获取数据,更正 UTF-8 并将其放入 XML 格式。从那里,我使用 CROSS APPLY 函数来获取数据,但它有严重的扇出问题,它会多次显示数据,而不是每次只显示 1 行:
DECLARE @xml TABLE (IMPORTED_XML xml)
INSERT INTO @xml
SELECT
CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS XML) AS IMPORTED_XML
FROM MyXMLTable as mxt
with temp1 AS (
SELECT DISTINCT
sales_order.value('(./AccountNum/text())[1]','nvarchar(max)') AS ACCOUNT_NUM
, sales_order.value('(./OrderNum/text())[1]','nvarchar(max)') AS ORDER_NUM
, extra_so.value('(./CompanyName/text())[1]','nvarchar(max)') AS COMPANY_NAME
, base.value('(./ItemNum/text())[1]','nvarchar(max)') AS ITEM_ID
, base.value('(./InventoryNum/text())[1]','nvarchar(max)') AS INVENTORY_NUM
, sales.value('(./QtyOrdered/text())[1]','nvarchar(max)') AS QTY_ORDERED
, sales.value('(./UnitofMeasure/text())[1]','nvarchar(max)') AS ITEM_UOM
, sales.value('(./ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
FROM @xml
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('//OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY core.nodes('//OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY base.nodes('//OtherDetails') as sales(sales)
CROSS APPLY sales_order.nodes('//OrderDetails') AS extra_so(extra_so)
CROSS APPLY sales.nodes('//ItemCode') as itmcode(itmcode)
)
select * from temp1 order by item_desc asc
这似乎主要 有效,但它最终会为相同的内容生成多行数据...我习惯使用 lateral flatten Snowflake 中的函数,但在 SQL Server 2012 中不是这个 XML 解析。对此有何见解?预先感谢您的帮助
你的问题是你从根开始一直交叉连接每个嵌套节点,因为你正在使用 //
.
还有其他需要注意的地方:
- 您不需要临时表,您可以
CROSS APPLY
在一个查询中将所有内容放在一起 - 如果该列已经是
varchar
,则不需要REPLACE
,仅当它是nvarchar
。 - 您不需要在每一层嵌套上都使用
.nodes
,只有当您需要来自一个层的多个项目时才需要它。 - 仔细选择你的数据类型,所有的东西都是
nvarchar(max)
吗?
SELECT
sales_order.value('(AccountNum/text())[1]','varchar(50)') AS ACCOUNT_NUM
, sales_order.value('(OrderNum/text())[1]','varchar(50)') AS ORDER_NUM
, sales_order.value('(OrderDetails/CompanyName/text())[1]','nvarchar(200)') AS COMPANY_NAME
, base.value('(ItemNum/text())[1]','varchar(50)') AS ITEM_ID
, base.value('(InventoryNum/text())[1]','varchar(50)') AS INVENTORY_NUM
, sales.value('(QtyOrdered/text())[1]','int') AS QTY_ORDERED
, sales.value('(UnitofMeasure/text())[1]','varchar(20)') AS ITEM_UOM
, sales.value('(ItemDesc/text())[1]','nvarchar(max)') AS ITEM_DESC
, itmcode.value('(CodeType/text())[1]','varchar(20)') AS itemcodetype
, itmcode.value('(CodeID/text())[1]','varchar(50)') AS itemcodeID
FROM MyXMLTable as mxt
CROSS APPLY (VALUES( CAST(REPLACE(mxt.XML_FIELD,'encoding="UTF-8"','encoding="UTF-16"') AS xml) )) v(IMPORTED_XML)
CROSS APPLY IMPORTED_XML.nodes('/header1/header2') AS core(core)
CROSS APPLY core.nodes('OrderFormHeader') AS sales_order(sales_order)
CROSS APPLY core.nodes('OrderFormDetails/OrderFormLines') as base(base)
CROSS APPLY base.nodes('OtherDetails') as sales(sales)
CROSS APPLY sales.nodes('ItemCode') as itmcode(itmcode);