读取 XML 数据时为不明确的父级别提供唯一标识符

Giving unique identifier to ambiguous parent level when reading XML data

我有一个 XML 文档,我针对这个问题对其进行了简化,格式如下:

<?xml version="1.0"?>
<xml>
    <aggregateddata>
        <aggregateddata>
            <item value="abcdefg1" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
            <item value="BE" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
        <aggregateddata>
            <item value="abcdefg2" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
            <item value="UK" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
    </aggregateddata>
</xml>

我希望能够阅读此内容,最好是将每组值放在自己的行中。将它变成更 SQL 服务器友好的东西,像这样:

id dataSet dataSetLabel indicator periodType periodFrom periodTo countryKey bzShort
abcdefg1 1 Aggregates Physical Flow day 2021-10-16T06:00:00+02:00 2021-10-17T06:00:00+02:00 BE L-Zone
abcdefg2 1 Aggregates Physical Flow day 2021-10-16T06:00:00+02:00 2021-10-17T06:00:00+02:00 UK L-Zone

为此,我可以使用以下内容阅读 XML:

select
    XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
    XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata/item') as XMLDataNodes(x)

然后使用某种动态枢轴将结果转换为我需要的结果。

但问题是,我无能为力 'group by' - 父注释上基本上没有任何价值可以放在它们旁边。我试过类似这样的方法 https://www.sqlservercentral.com/forums/topic/how-to-uniquely-number-parent-and-child-nodes-while-reading-an-xml-document,为每个组添加一个标识符,但在实际的完整数据集上速度非常慢:

select
    XMLDataNodes.x.value('@name', 'varchar(50)') as FieldName,
    XMLDataNodes.x.value('@value', 'varchar(500)') as FieldValue,
    XMLNodes.x.value('1+count(for $a in . return $a/../*[. << $a])','int') as parentID
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLNodes(x)
    cross apply XMLNodes.x.nodes('item') as XMLDataNodes(x)

有没有办法让 ID 值更快,或者直接拉出数据,而不需要这一步(或之后的数据透视)?

这样的事情可能会有所帮助(并且至少应该比动态 XQuery 更快):

select 
    XmlDataNodes.x.value('(item[@name="id"]/@value)[1]', 'varchar(500)') as Id,
    Items.*
from 
    @XmlFile.nodes ('/xml/aggregateddata/aggregateddata') as XMLDataNodes(x)
    cross apply (
       select
         ItemNodes.x.value('@name', 'varchar(50)') as FieldName,
         ItemNodes.x.value('@value', 'varchar(500)') as FieldValue
       from XMLDataNodes.x.nodes('item') ItemNodes(x)
    ) Items

结果:

Id FieldName FieldValue
abcdefg1 id abcdefg1
abcdefg1 dataSet 1
abcdefg1 dataSetLabel Aggregates
abcdefg1 indicator Physical Flow
... ... ...

请尝试以下解决方案。

SQL Server的XQuery很强大

主要思想是将 XPath 与谓词一起使用:

item[@name="..."]/@value

SQL

DECLARE @xml XML =
N'<xml>
    <aggregateddata>
        <aggregateddata>
            <item value="abcdefg1" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
            <item value="BE" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
        <aggregateddata>
            <item value="abcdefg2" name="id"/>
            <item value="1" name="dataSet"/>
            <item value="Aggregates" name="dataSetLabel"/>
            <item value="Physical Flow" name="indicator"/>
            <item value="day" name="periodType"/>
            <item value="2021-10-16T06:00:00+02:00" name="periodFrom"/>
            <item value="2021-10-17T06:00:00+02:00" name="periodTo"/>
            <item value="UK" name="countryKey"/>
            <item value="L-Zone" name="bzShort"/>
        </aggregateddata>
    </aggregateddata>
</xml>';

SELECT c.value('(item[@name="id"]/@value)[1]', 'varchar(50)') as id
    , c.value('(item[@name="dataSet"]/@value)[1]', 'varchar(500)') as dataSet
    , c.value('(item[@name="dataSetLabel"]/@value)[1]', 'varchar(500)') as dataSetLabel
    , c.value('(item[@name="indicator"]/@value)[1]', 'varchar(500)') as indicator
    , c.value('(item[@name="periodType"]/@value)[1]', 'varchar(500)') as periodType
    , c.value('(item[@name="periodFrom"]/@value)[1]', 'datetimeoffset(0)') as periodFrom
    , c.value('(item[@name="periodTo"]/@value)[1]', 'datetimeoffset(0)') as periodTo
    , c.value('(item[@name="countryKey"]/@value)[1]', 'CHAR(2)') as countryKey
    , c.value('(item[@name="bzShort"]/@value)[1]', 'VARCHAR(20)') as bzShort
FROM @xml.nodes('/xml/aggregateddata/aggregateddata') as t(c);

输出

+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
|    id    | dataSet | dataSetLabel |   indicator   | periodType |         periodFrom         |          periodTo          | countryKey | bzShort |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+
| abcdefg1 |       1 | Aggregates   | Physical Flow | day        | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | BE         | L-Zone  |
| abcdefg2 |       1 | Aggregates   | Physical Flow | day        | 2021-10-16 06:00:00 +02:00 | 2021-10-17 06:00:00 +02:00 | UK         | L-Zone  |
+----------+---------+--------------+---------------+------------+----------------------------+----------------------------+------------+---------+