读取 XML 数据时为不明确的父级别提供唯一标识符
Giving unique identifier to ambiguous parent level when reading XML data
我有一个 XML 文档,我针对这个问题对其进行了简化,格式如下:
<?xml version="1.0"?>
<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>
我希望能够阅读此内容,最好是将每组值放在自己的行中。将它变成更 SQL 服务器友好的东西,像这样:
id
dataSet
dataSetLabel
indicator
periodType
periodFrom
periodTo
countryKey
bzShort
abcdefg1
1
Aggregates
Physical Flow
day
2021-10-16T06:00:00+02:00
2021-10-17T06:00:00+02:00
BE
L-Zone
abcdefg2
1
Aggregates
Physical Flow
day
2021-10-16T06:00:00+02:00
2021-10-17T06:00:00+02:00
UK
L-Zone
为此,我可以使用以下内容阅读 XML:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata/item') as XMLDataNodes(x)
然后使用某种动态枢轴将结果转换为我需要的结果。
但问题是,我无能为力 'group by' - 父注释上基本上没有任何价值可以放在它们旁边。我试过类似这样的方法 https://www.sqlservercentral.com/forums/topic/how-to-uniquely-number-parent-and-child-nodes-while-reading-an-xml-document,为每个组添加一个标识符,但在实际的完整数据集上速度非常慢:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue,
XMLNodes.x.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLNodes(x)
cross apply XMLNodes.x.nodes('item') as XMLDataNodes(x)
有没有办法让 ID 值更快,或者直接拉出数据,而不需要这一步(或之后的数据透视)?
这样的事情可能会有所帮助(并且至少应该比动态 XQuery 更快):
select
XmlDataNodes.x.value('(item[@name="id"]/@value)[1]', 'varchar(500)') as Id,
Items.*
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLDataNodes(x)
cross apply (
select
ItemNodes.x.value('@name', 'varchar(50)') as FieldName,
ItemNodes.x.value('@value', 'varchar(500)') as FieldValue
from XMLDataNodes.x.nodes('item') ItemNodes(x)
) Items
结果:
Id
FieldName
FieldValue
abcdefg1
id
abcdefg1
abcdefg1
dataSet
1
abcdefg1
dataSetLabel
Aggregates
abcdefg1
indicator
Physical Flow
...
...
...
请尝试以下解决方案。
SQL Server的XQuery很强大
主要思想是将 XPath 与谓词一起使用:
item[@name="..."]/@value
SQL
DECLARE @xml XML =
N'<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>';
SELECT c.value('(item[@name="id"]/@value)[1]', 'varchar(50)') as id
, c.value('(item[@name="dataSet"]/@value)[1]', 'varchar(500)') as dataSet
, c.value('(item[@name="dataSetLabel"]/@value)[1]', 'varchar(500)') as dataSetLabel
, c.value('(item[@name="indicator"]/@value)[1]', 'varchar(500)') as indicator
, c.value('(item[@name="periodType"]/@value)[1]', 'varchar(500)') as periodType
, c.value('(item[@name="periodFrom"]/@value)[1]', 'datetimeoffset(0)') as periodFrom
, c.value('(item[@name="periodTo"]/@value)[1]', 'datetimeoffset(0)') as periodTo
, c.value('(item[@name="countryKey"]/@value)[1]', 'CHAR(2)') as countryKey
, c.value('(item[@name="bzShort"]/@value)[1]', 'VARCHAR(20)') as bzShort
FROM @xml.nodes('/xml/aggregateddata/aggregateddata') as t(c);
输出
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
| id | dataSet | dataSetLabel | indicator | periodType | periodFrom | periodTo | countryKey | bzShort |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
| abcdefg1 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | BE | L-Zone |
| abcdefg2 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | UK | L-Zone |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
我有一个 XML 文档,我针对这个问题对其进行了简化,格式如下:
<?xml version="1.0"?>
<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>
我希望能够阅读此内容,最好是将每组值放在自己的行中。将它变成更 SQL 服务器友好的东西,像这样:
id | dataSet | dataSetLabel | indicator | periodType | periodFrom | periodTo | countryKey | bzShort |
---|---|---|---|---|---|---|---|---|
abcdefg1 | 1 | Aggregates | Physical Flow | day | 2021-10-16T06:00:00+02:00 | 2021-10-17T06:00:00+02:00 | BE | L-Zone |
abcdefg2 | 1 | Aggregates | Physical Flow | day | 2021-10-16T06:00:00+02:00 | 2021-10-17T06:00:00+02:00 | UK | L-Zone |
为此,我可以使用以下内容阅读 XML:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata/item') as XMLDataNodes(x)
然后使用某种动态枢轴将结果转换为我需要的结果。
但问题是,我无能为力 'group by' - 父注释上基本上没有任何价值可以放在它们旁边。我试过类似这样的方法 https://www.sqlservercentral.com/forums/topic/how-to-uniquely-number-parent-and-child-nodes-while-reading-an-xml-document,为每个组添加一个标识符,但在实际的完整数据集上速度非常慢:
select
XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue,
XMLNodes.x.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLNodes(x)
cross apply XMLNodes.x.nodes('item') as XMLDataNodes(x)
有没有办法让 ID 值更快,或者直接拉出数据,而不需要这一步(或之后的数据透视)?
这样的事情可能会有所帮助(并且至少应该比动态 XQuery 更快):
select
XmlDataNodes.x.value('(item[@name="id"]/@value)[1]', 'varchar(500)') as Id,
Items.*
from
@XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLDataNodes(x)
cross apply (
select
ItemNodes.x.value('@name', 'varchar(50)') as FieldName,
ItemNodes.x.value('@value', 'varchar(500)') as FieldValue
from XMLDataNodes.x.nodes('item') ItemNodes(x)
) Items
结果:
Id | FieldName | FieldValue |
---|---|---|
abcdefg1 | id | abcdefg1 |
abcdefg1 | dataSet | 1 |
abcdefg1 | dataSetLabel | Aggregates |
abcdefg1 | indicator | Physical Flow |
... | ... | ... |
请尝试以下解决方案。
SQL Server的XQuery很强大
主要思想是将 XPath 与谓词一起使用:
item[@name="..."]/@value
SQL
DECLARE @xml XML =
N'<xml>
<aggregateddata>
<aggregateddata>
<item value="abcdefg1" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="BE" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
<aggregateddata>
<item value="abcdefg2" name="id"/>
<item value="1" name="dataSet"/>
<item value="Aggregates" name="dataSetLabel"/>
<item value="Physical Flow" name="indicator"/>
<item value="day" name="periodType"/>
<item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
<item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
<item value="UK" name="countryKey"/>
<item value="L-Zone" name="bzShort"/>
</aggregateddata>
</aggregateddata>
</xml>';
SELECT c.value('(item[@name="id"]/@value)[1]', 'varchar(50)') as id
, c.value('(item[@name="dataSet"]/@value)[1]', 'varchar(500)') as dataSet
, c.value('(item[@name="dataSetLabel"]/@value)[1]', 'varchar(500)') as dataSetLabel
, c.value('(item[@name="indicator"]/@value)[1]', 'varchar(500)') as indicator
, c.value('(item[@name="periodType"]/@value)[1]', 'varchar(500)') as periodType
, c.value('(item[@name="periodFrom"]/@value)[1]', 'datetimeoffset(0)') as periodFrom
, c.value('(item[@name="periodTo"]/@value)[1]', 'datetimeoffset(0)') as periodTo
, c.value('(item[@name="countryKey"]/@value)[1]', 'CHAR(2)') as countryKey
, c.value('(item[@name="bzShort"]/@value)[1]', 'VARCHAR(20)') as bzShort
FROM @xml.nodes('/xml/aggregateddata/aggregateddata') as t(c);
输出
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
| id | dataSet | dataSetLabel | indicator | periodType | periodFrom | periodTo | countryKey | bzShort |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
| abcdefg1 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | BE | L-Zone |
| abcdefg2 | 1 | Aggregates | Physical Flow | day | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | UK | L-Zone |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+