从 XML 中取出重复项

Take out duplicates from XML

我的查询需要一些帮助...我不想获得重复且缺少 LegId 的 tradeId:s。你能帮帮我吗?

我的XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<data>
<value>
    <TradeId>928</TradeId>
    <LegId>1</LegId>
</value>
<value>
    <TradeId>928</TradeId>
    <LegId>2</LegId>
</value>
<value>
    <TradeId>928</TradeId>
    //MISSING LEGID HERE
</value>
<value>
    <TradeId>929</TradeId>
    <LegId>1</LegId>
</value>
<value>
    <TradeId>929</TradeId>
    <LegId>2</LegId>
</value>
<value>
    <TradeId>930</TradeId>
    <LegId>2</LegId>
</value>
</data>

我将此 XML 声明为变量,然后用结果填充 #temptable:

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
    WHERE   1 = 1
) AS t



SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

这给了我以下输出:

在这种情况下,我唯一不需要的行是第 3 行,标记为黄色的行(我只需要 TradeId 列)。此查询:

SELECT t.strTradeId
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
) AS t
WHERE   1 = 1
--AND       t.LegId IS NULL
GROUP BY  t.strTradeId
HAVING COUNT(t.strTradeId) > 1


SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

剩下两行分别是 928 和 929,但我无法得到 LegId 为 NULL 的那一行...

此案例的请求输出:TradeId 928。

你能帮我解决这个问题吗?

一种可能的方法,将 FROM 子句的 xpath 修改为 select 只有 <value> 没有 child <LegId> :

data/value[not(LegId)]

查看实际的 xpath :

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value[not(LegId)]') AS elements(e)
    WHERE   1 = 1
) AS t

SELECT   *
FROM    #tradeIdDuplicatesToIgnore AS t

输出:

更新:

我之前错过了检查重复项的要求。所以这是实现相同目的的不同方法,但增加了重复检查:

SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
    WHERE   1 = 1
) AS t

SELECT   t.strTradeId
FROM    #tradeIdDuplicatesToIgnore AS t
        INNER JOIN 
        (
            SELECT COUNT(*) 'count', strTradeId
            FROM #tradeIdDuplicatesToIgnore
            GROUP BY strTradeId
        ) As t2 on t2.strTradeId = t.strTradeId
WHERE LegId IS NULL AND t2.count > 1

输出:

更新二:

;with T as (
    SELECT 
         e.value('TradeId[1]','varchar(50)') AS strTradeId
        ,e.value('LegId[1]','int') AS LegId
    FROM @xmlData.nodes('data/value') AS elements(e)
)
SELECT *
INTO #tradeIdDuplicatesToIgnore
FROM
(
    SELECT T.strTradeId
    FROM T
    GROUP BY T.strTradeId
    HAVING COUNT(*)>1 AND COUNT(*)>COUNT(T.LegId)
) AS t

SELECT * FROM #tradeIdDuplicatesToIgnore

您可以使用此查询来获取包含空值的重复项:

;with cte_splitted as (
    select
        e.e.value('TradeId[1]','varchar(50)') as strTradeId,
        e.e.value('LegId[1]','int') as LegId
    from @xmlData.nodes('data/value') as e(e)
)
select
    c.strTradeId
into #tradeIdDuplicatesToIgnore
from cte_splitted as c
group by
    c.strTradeId
having
    count(*) > count(c.LegId) and -- count of all records <> count of not null records
    count(*) > 1 -- there're more than 1 record

sql fiddle demo