SQL - 如果有 NULL 值,如何执行 Window 函数?

SQL - How to do Window function if there is NULL value?

首先,我有这个信息:

  1. 权重A
  2. 体重B
  3. B 与 A 的关系:一对多

这样,可以得到如下结果:

A_Id Weight A Weight B B_Id
1 3 16 1
2 5 16 1
3 6 16 1
4 7 16 1
5 2 12 2
6 6 12 2

现在,再添加两列:Sum Weight A By B_IdAccumulative Difference (考虑下面的 table t2

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulative Diff
1 3 21 16 1 5
2 5 21 16 1 5
3 6 21 16 1 5
4 7 21 16 1 5
5 2 8 12 2 1
6 6 8 12 2 1

例如上面的例子,

  1. 第一行累计差=>21 - 16 = 5

  2. 第五行累计差=>(21+8)-(16+12)=1

所以,我的objective,就是计算这样的'Accumulative Difference',整个结果要显示在报告中。

从技术上讲,通过使用 'Window Functions',这可以毫无问题地实现。 首先,我必须再创建 2 个列:Accumulate Weight A By B_IdAccumulate Weight B。然后,找出两者的区别即可。

我实际上还需要 3 列:

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Row By B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 1 21 16 5
2 5 21 16 1 2 21 16 5
3 6 21 16 1 3 21 16 5
4 7 21 16 1 4 21 16 5
5 2 8 12 2 1 29 28 1
6 6 8 12 2 2 29 28 1

样本SQL(生成t2):

SELECT *, [累计权重 A B_Id] = SUM(WeightA) OVER (PARTITION BY ... ORDER BY B_Id), [累计权重 B] = SUM(WeightB) OVER (PARTITION BY ... ORDER BY B_Id) 从 t2 -- (...) 可以按日期年月 -- Accumulate Weight B 可以设置为只有第一行,等等
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
    
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
    
),
tableB AS (
     SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
     UNION
     SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'

    --Uncomment for testing NULL value
    --UNION
    --SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
    SELECT 
        [A_Id] = tableA.A_Id,
        [WeightA] = tableA.Weight,
        [WeightB] = tableB.Weight,
        [B_Id] = tableB.B_Id,
        [Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
    FROM 
        tableA 
    FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
    SELECT
        *,
        [Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
        [Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
    FROM t1a
),
t2 AS (
    SELECT 
        *,
        [Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
    FROM t1b
)
SELECT 
    *
FROM t2

现在,问题来了,如果B_Id之一是NULL。 (取消注释部分以生成 NULL B_Id)

以下是我的预期结果,尤其是在突出显示的行上:

A_Id Weight A Sum Weight A By B_Id Weight B B_Id Accumulate Weight A By B_Id Accumulate Weight B Accumulative Diff
1 3 21 16 1 21 16 5
2 5 21 16 1 21 16 5
3 6 21 16 1 21 16 5
4 7 21 16 1 21 16 5
5 2 8 12 2 29 28 1
6 6 8 12 2 29 28 1
7 9 9 0 NULL 38 28 10
8 7 10 8 3 48 36 12
9 3 10 8 3 48 36 12

但是,对于我的示例查询,这不起作用。相反,出现以下内容:

NULL B_Id 出现在第一行。 (顺序乱了)

所以我的问题是,如何处理这种情况? (保留原始行与预期结果相同)

为什么顺序是这样的? (由@ThorstenKettner 提出)

默认顺序基于 B_TransactionDatetime。如果 B_Id 为 NULL,则它将基于 A_TransactionDatetime。因此,我计算了另一列 RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime),并根据该列进行排序。

PS:

受@ThorstenKettner 的启发,我应该在 window 函数中使用 RefDateTime,即:

[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)

案件结案。

您可以使用 coalesce()。

SELECT 
 *,
 [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
 [Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
 SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference

FROM t2

PS:实际上你最初的查询在我看来是错误的,如果那是正确的那么就可以了。 也许你应该给出 A 和 B 的样本数据。对我来说,在加入它们之前先求和()更有意义。

您将不得不进行更改,但这应该会有所帮助。`

SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
        PARTITION BY...ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(WeightB) OVER (
        PARTITION BY...ORDER BY B_Id
        )
FROM t2
WHERE B_Id IS NOT NULL

UNION

SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
FROM (
    SELECT WeightA
        ,WeightB
        ,B_Id
        ,ROW_NUMBER() OVER (
            ORDER BY B_ID
            ) AS ROW_NUM
    FROM T2
    WHERE B_ID IS NULL
    ) TAB

`

您想将 B 外连接到 A,因为并非每个 A 都有关联的 B。

然后你逐块查看行。一个块要么是所有 rows.belonging 到一个 B,要么是没有 B 的单个 A 行。b_id 将成为前者的一个很好的组密钥,而 a_id 将适合后者。对于组合键,有不同的选项。 COALESCE(b_id, a_id) 不是其中之一,因为我们可以在结果集中有一个 a_id 1 和一个 b_id 1,但不希望它们在同一组中。一种解决方案是简单的 COALESCE(b_id, -a_id),当然前提是您的 ID 不能为负数。

现在,您所有的计算都基于聚合组,即当它们属于 B 组时,您对单个 A 值不感兴趣。出于这个原因,我会立即聚合并且只在最后再次加入单个 A 行。

行的顺序是 COALESCE(b_date, a_date)

 with grouped as
    (
      select
        coalesce(b.b_id, -a.a_id) as grp_id,
        max(coalesce(b.date, a.date)) as grp_date,
        coalesce(max(b.weight), 0) as b_weight,
        sum(a.weight) as a_weight
      from a
      left join b on b.b_id =a.b_id
      group by coalesce(b.b_id, -a.a_id)
    )
    , calculated as
    (
      select
        grp_id,
        grp_date,
        b_weight,
        a_weight,
        sum(a_weight - b_weight) over (order by grp_date) as running_diff
        from grouped
    )
    select *
    from calculated c
    join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
    order by c.grp_date, a.date;

希望一切顺利。我手边没有电脑,只能在手机上打字,结果比我想象的要难:-)