SQL

Question

首先，我有这个信息：

权重A
体重B
B 与 A 的关系：一对多

这样，可以得到如下结果：

A_Id	Weight A	Weight B	B_Id
1	3	16	1
2	5	16	1
3	6	16	1
4	7	16	1
5	2	12	2
6	6	12	2

现在，再添加两列：Sum Weight A By B_Id、Accumulative Difference （考虑下面的 table t2）

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Accumulative Diff
1	3	21	16	1	5
2	5	21	16	1	5
3	6	21	16	1	5
4	7	21	16	1	5
5	2	8	12	2	1
6	6	8	12	2	1

例如上面的例子，

第一行累计差=>21 - 16 = 5
第五行累计差=>(21+8)-(16+12)=1

所以，我的objective，就是计算这样的'Accumulative Difference'，整个结果要显示在报告中。

从技术上讲，通过使用 'Window Functions'，这可以毫无问题地实现。 ~~首先，我必须再创建 2 个列：Accumulate Weight A By B_Id、Accumulate Weight B。然后，找出两者的区别即可。~~

我实际上还需要 3 列：

[行 B_Id]
[将权重 A 相加 B_Id]
[累积权重B]

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Row By B_Id	Accumulate Weight A By B_Id	Accumulate Weight B	Accumulative Diff
1	3	21	16	1	1	21	16	5
2	5	21	16	1	2	21	16	5
3	6	21	16	1	3	21	16	5
4	7	21	16	1	4	21	16	5
5	2	8	12	2	1	29	28	1
6	6	8	12	2	2	29	28	1

样本SQL（生成t2）：

SELECT *, [累计权重 A B_Id] = SUM(WeightA) OVER (PARTITION BY ... ORDER BY B_Id), [累计权重 B] = SUM(WeightB) OVER (PARTITION BY ... ORDER BY B_Id) 从 t2 -- (...) 可以按日期年月 -- Accumulate Weight B 可以设置为只有第一行，等等

;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
    
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
    
),
tableB AS (
     SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
     UNION
     SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'

    --Uncomment for testing NULL value
    --UNION
    --SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
    SELECT 
        [A_Id] = tableA.A_Id,
        [WeightA] = tableA.Weight,
        [WeightB] = tableB.Weight,
        [B_Id] = tableB.B_Id,
        [Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
    FROM 
        tableA 
    FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
    SELECT
        *,
        [Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
        [Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
    FROM t1a
),
t2 AS (
    SELECT 
        *,
        [Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
    FROM t1b
)
SELECT 
    *
FROM t2

现在，问题来了，如果B_Id之一是NULL。（取消注释部分以生成 NULL B_Id）

以下是我的预期结果，尤其是在突出显示的行上：

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Accumulate Weight A By B_Id	Accumulate Weight B	Accumulative Diff
1	3	21	16	1	21	16	5
2	5	21	16	1	21	16	5
3	6	21	16	1	21	16	5
4	7	21	16	1	21	16	5
5	2	8	12	2	29	28	1
6	6	8	12	2	29	28	1
7	9	9	0	NULL	38	28	10
8	7	10	8	3	48	36	12
9	3	10	8	3	48	36	12

但是，对于我的示例查询，这不起作用。相反，出现以下内容：

NULL B_Id 出现在第一行。（顺序乱了）

所以我的问题是，如何处理这种情况？（保留原始行与预期结果相同）

为什么顺序是这样的？（由@ThorstenKettner 提出）

默认顺序基于 B_TransactionDatetime。如果 B_Id 为 NULL，则它将基于 A_TransactionDatetime。因此，我计算了另一列 RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime)，并根据该列进行排序。

PS:

受@ThorstenKettner 的启发，我应该在 window 函数中使用 RefDateTime，即：

[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)

案件结案。

Answer 1

您可以使用 coalesce()。

SELECT 
 *,
 [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
 [Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
 SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference

FROM t2

PS：实际上你最初的查询在我看来是错误的，如果那是正确的那么就可以了。也许你应该给出 A 和 B 的样本数据。对我来说，在加入它们之前先求和（）更有意义。

Answer 2

您将不得不进行更改，但这应该会有所帮助。`

SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
        PARTITION BY...ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(WeightB) OVER (
        PARTITION BY...ORDER BY B_Id
        )
FROM t2
WHERE B_Id IS NOT NULL

UNION

SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
    ,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
        PARTITION BY TAB.ROW_NUM ORDER BY B_Id
        )
FROM (
    SELECT WeightA
        ,WeightB
        ,B_Id
        ,ROW_NUMBER() OVER (
            ORDER BY B_ID
            ) AS ROW_NUM
    FROM T2
    WHERE B_ID IS NULL
    ) TAB

`

Answer 3

您想将 B 外连接到 A，因为并非每个 A 都有关联的 B。

然后你逐块查看行。一个块要么是所有 rows.belonging 到一个 B，要么是没有 B 的单个 A 行。b_id 将成为前者的一个很好的组密钥，而 a_id 将适合后者。对于组合键，有不同的选项。 COALESCE(b_id, a_id) 不是其中之一，因为我们可以在结果集中有一个 a_id 1 和一个 b_id 1，但不希望它们在同一组中。一种解决方案是简单的 COALESCE(b_id, -a_id)，当然前提是您的 ID 不能为负数。

现在，您所有的计算都基于聚合组，即当它们属于 B 组时，您对单个 A 值不感兴趣。出于这个原因，我会立即聚合并且只在最后再次加入单个 A 行。

行的顺序是 COALESCE(b_date, a_date)。

 with grouped as
    (
      select
        coalesce(b.b_id, -a.a_id) as grp_id,
        max(coalesce(b.date, a.date)) as grp_date,
        coalesce(max(b.weight), 0) as b_weight,
        sum(a.weight) as a_weight
      from a
      left join b on b.b_id =a.b_id
      group by coalesce(b.b_id, -a.a_id)
    )
    , calculated as
    (
      select
        grp_id,
        grp_date,
        b_weight,
        a_weight,
        sum(a_weight - b_weight) over (order by grp_date) as running_diff
        from grouped
    )
    select *
    from calculated c
    join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
    order by c.grp_date, a.date;

希望一切顺利。我手边没有电脑，只能在手机上打字，结果比我想象的要难:-)

SQL - 如果有 NULL 值，如何执行 Window 函数？

SQL - How to do Window function if there is NULL value?

sql-server

window-functions

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Row By B_Id	Accumulate Weight A By B_Id	Accumulate Weight B	Accumulative Diff
1	3	21	16	1	1	21	16	5
2	5	21	16	1	2	21	16	5
3	6	21	16	1	3	21	16	5
4	7	21	16	1	4	21	16	5
5	2	8	12	2	1	29	28	1
6	6	8	12	2	2	29	28	1

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Row By B_Id	Accumulate Weight A By B_Id	Accumulate Weight B	Accumulative Diff
1	3	21	16	1	1	21	16	5
2	5	21	16	1	2	21	16	5
3	6	21	16	1	3	21	16	5
4	7	21	16	1	4	21	16	5
5	2	8	12	2	1	29	28	1
6	6	8	12	2	2	29	28	1

A_Id	Weight A	Sum Weight A By B_Id	Weight B	B_Id	Row By B_Id	Accumulate Weight A By B_Id	Accumulate Weight B	Accumulative Diff
1	3	21	16	1	1	21	16	5
2	5	21	16	1	2	21	16	5
3	6	21	16	1	3	21	16	5
4	7	21	16	1	4	21	16	5
5	2	8	12	2	1	29	28	1
6	6	8	12	2	2	29	28	1