SQL - 如果有 NULL 值,如何执行 Window 函数?
SQL - How to do Window function if there is NULL value?
首先,我有这个信息:
- 权重A
- 体重B
- B 与 A 的关系:一对多
这样,可以得到如下结果:
A_Id
Weight A
Weight B
B_Id
1
3
16
1
2
5
16
1
3
6
16
1
4
7
16
1
5
2
12
2
6
6
12
2
现在,再添加两列:Sum Weight A By B_Id
、Accumulative Difference
(考虑下面的 table t2
)
A_Id
Weight A
Sum Weight A By B_Id
Weight B
B_Id
Accumulative Diff
1
3
21
16
1
5
2
5
21
16
1
5
3
6
21
16
1
5
4
7
21
16
1
5
5
2
8
12
2
1
6
6
8
12
2
1
例如上面的例子,
第一行累计差=>21 - 16 = 5
第五行累计差=>(21+8)-(16+12)=1
所以,我的objective,就是计算这样的'Accumulative Difference
',整个结果要显示在报告中。
从技术上讲,通过使用 'Window Functions',这可以毫无问题地实现。
首先,我必须再创建 2 个列:Accumulate Weight A By B_Id
、Accumulate Weight B
。然后,找出两者的区别即可。
我实际上还需要 3 列:
- [行 B_Id]
- [将权重 A 相加 B_Id]
- [累积权重B]
A_Id
Weight A
Sum Weight A By B_Id
Weight B
B_Id
Row By B_Id
Accumulate Weight A By B_Id
Accumulate Weight B
Accumulative Diff
1
3
21
16
1
1
21
16
5
2
5
21
16
1
2
21
16
5
3
6
21
16
1
3
21
16
5
4
7
21
16
1
4
21
16
5
5
2
8
12
2
1
29
28
1
6
6
8
12
2
2
29
28
1
样本SQL(生成t2
):
SELECT
*,
[累计权重 A B_Id] = SUM(WeightA) OVER (PARTITION BY ... ORDER BY B_Id),
[累计权重 B] = SUM(WeightB) OVER (PARTITION BY ... ORDER BY B_Id)
从 t2
-- (...) 可以按日期年月
-- Accumulate Weight B 可以设置为只有第一行,等等
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
),
tableB AS (
SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
UNION
SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
SELECT
[A_Id] = tableA.A_Id,
[WeightA] = tableA.Weight,
[WeightB] = tableB.Weight,
[B_Id] = tableB.B_Id,
[Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
FROM
tableA
FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
SELECT
*,
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
FROM t1a
),
t2 AS (
SELECT
*,
[Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
FROM t1b
)
SELECT
*
FROM t2
现在,问题来了,如果B_Id
之一是NULL
。 (取消注释部分以生成 NULL B_Id)
以下是我的预期结果,尤其是在突出显示的行上:
A_Id
Weight A
Sum Weight A By B_Id
Weight B
B_Id
Accumulate Weight A By B_Id
Accumulate Weight B
Accumulative Diff
1
3
21
16
1
21
16
5
2
5
21
16
1
21
16
5
3
6
21
16
1
21
16
5
4
7
21
16
1
21
16
5
5
2
8
12
2
29
28
1
6
6
8
12
2
29
28
1
7
9
9
0
NULL
38
28
10
8
7
10
8
3
48
36
12
9
3
10
8
3
48
36
12
但是,对于我的示例查询,这不起作用。相反,出现以下内容:
NULL B_Id 出现在第一行。 (顺序乱了)
所以我的问题是,如何处理这种情况? (保留原始行与预期结果相同)
为什么顺序是这样的? (由@ThorstenKettner 提出)
默认顺序基于 B_TransactionDatetime
。如果 B_Id
为 NULL,则它将基于 A_TransactionDatetime
。因此,我计算了另一列 RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime)
,并根据该列进行排序。
PS:
受@ThorstenKettner 的启发,我应该在 window 函数中使用 RefDateTime
,即:
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)
案件结案。
您可以使用 coalesce()。
SELECT
*,
[Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
[Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference
FROM t2
PS:实际上你最初的查询在我看来是错误的,如果那是正确的那么就可以了。
也许你应该给出 A 和 B 的样本数据。对我来说,在加入它们之前先求和()更有意义。
您将不得不进行更改,但这应该会有所帮助。`
SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
PARTITION BY...ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(WeightB) OVER (
PARTITION BY...ORDER BY B_Id
)
FROM t2
WHERE B_Id IS NOT NULL
UNION
SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
FROM (
SELECT WeightA
,WeightB
,B_Id
,ROW_NUMBER() OVER (
ORDER BY B_ID
) AS ROW_NUM
FROM T2
WHERE B_ID IS NULL
) TAB
`
您想将 B 外连接到 A,因为并非每个 A 都有关联的 B。
然后你逐块查看行。一个块要么是所有 rows.belonging 到一个 B,要么是没有 B 的单个 A 行。b_id 将成为前者的一个很好的组密钥,而 a_id 将适合后者。对于组合键,有不同的选项。 COALESCE(b_id, a_id)
不是其中之一,因为我们可以在结果集中有一个 a_id 1 和一个 b_id 1,但不希望它们在同一组中。一种解决方案是简单的 COALESCE(b_id, -a_id)
,当然前提是您的 ID 不能为负数。
现在,您所有的计算都基于聚合组,即当它们属于 B 组时,您对单个 A 值不感兴趣。出于这个原因,我会立即聚合并且只在最后再次加入单个 A 行。
行的顺序是 COALESCE(b_date, a_date)
。
with grouped as
(
select
coalesce(b.b_id, -a.a_id) as grp_id,
max(coalesce(b.date, a.date)) as grp_date,
coalesce(max(b.weight), 0) as b_weight,
sum(a.weight) as a_weight
from a
left join b on b.b_id =a.b_id
group by coalesce(b.b_id, -a.a_id)
)
, calculated as
(
select
grp_id,
grp_date,
b_weight,
a_weight,
sum(a_weight - b_weight) over (order by grp_date) as running_diff
from grouped
)
select *
from calculated c
join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
order by c.grp_date, a.date;
希望一切顺利。我手边没有电脑,只能在手机上打字,结果比我想象的要难:-)
首先,我有这个信息:
- 权重A
- 体重B
- B 与 A 的关系:一对多
这样,可以得到如下结果:
A_Id | Weight A | Weight B | B_Id |
---|---|---|---|
1 | 3 | 16 | 1 |
2 | 5 | 16 | 1 |
3 | 6 | 16 | 1 |
4 | 7 | 16 | 1 |
5 | 2 | 12 | 2 |
6 | 6 | 12 | 2 |
现在,再添加两列:Sum Weight A By B_Id
、Accumulative Difference
(考虑下面的 table t2
)
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulative Diff |
---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 5 |
2 | 5 | 21 | 16 | 1 | 5 |
3 | 6 | 21 | 16 | 1 | 5 |
4 | 7 | 21 | 16 | 1 | 5 |
5 | 2 | 8 | 12 | 2 | 1 |
6 | 6 | 8 | 12 | 2 | 1 |
例如上面的例子,
第一行累计差=>21 - 16 = 5
第五行累计差=>(21+8)-(16+12)=1
所以,我的objective,就是计算这样的'Accumulative Difference
',整个结果要显示在报告中。
从技术上讲,通过使用 'Window Functions',这可以毫无问题地实现。
首先,我必须再创建 2 个列:Accumulate Weight A By B_Id
、Accumulate Weight B
。然后,找出两者的区别即可。
我实际上还需要 3 列:
- [行 B_Id]
- [将权重 A 相加 B_Id]
- [累积权重B]
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Row By B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
---|---|---|---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 1 | 21 | 16 | 5 |
2 | 5 | 21 | 16 | 1 | 2 | 21 | 16 | 5 |
3 | 6 | 21 | 16 | 1 | 3 | 21 | 16 | 5 |
4 | 7 | 21 | 16 | 1 | 4 | 21 | 16 | 5 |
5 | 2 | 8 | 12 | 2 | 1 | 29 | 28 | 1 |
6 | 6 | 8 | 12 | 2 | 2 | 29 | 28 | 1 |
样本SQL(生成t2
):
;WITH tableA AS (
SELECT [A_Id] = 1, [Weight] = 3, [B_Id] = 1, [date] = '2021-10-01'
UNION
SELECT [A_Id] = 2, [Weight] = 5, [B_Id] = 1, [date] = '2021-10-02'
UNION
SELECT [A_Id] = 3, [Weight] = 6, [B_Id] = 1, [date] = '2021-10-03'
UNION
SELECT [A_Id] = 4, [Weight] = 7, [B_Id] = 1, [date] = '2021-10-04'
UNION
SELECT [A_Id] = 5, [Weight] = 2, [B_Id] = 2, [date] = '2021-10-05'
UNION
SELECT [A_Id] = 6, [Weight] = 6, [B_Id] = 2, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [A_Id] = 7, [Weight] = 9, [B_Id] = NULL, [date] = '2021-10-07'
--UNION
--SELECT [A_Id] = 8, [Weight] = 10, [B_Id] = 3, [date] = '2021-10-08'
),
tableB AS (
SELECT [B_Id] = 1, [Weight] = 16, [date] = '2021-10-03'
UNION
SELECT [B_Id] = 2, [Weight] = 12, [date] = '2021-10-06'
--Uncomment for testing NULL value
--UNION
--SELECT [B_Id] = 3, [Weight] = 8, [date] = '2021-10-08'
),
t1a AS (
SELECT
[A_Id] = tableA.A_Id,
[WeightA] = tableA.Weight,
[WeightB] = tableB.Weight,
[B_Id] = tableB.B_Id,
[Row By B_Id] = ROW_NUMBER() OVER(PARTITION BY tableB.B_Id ORDER BY A_Id)
FROM
tableA
FULL JOIN tableB ON tableA.B_Id = tableB.B_Id
),
t1b AS (
SELECT
*,
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY B_Id),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY B_Id)
FROM t1a
),
t2 AS (
SELECT
*,
[Accumulate Difference] = [Sum Weight A By B_Id] - [Accumulate Weight B]
FROM t1b
)
SELECT
*
FROM t2
现在,问题来了,如果B_Id
之一是NULL
。 (取消注释部分以生成 NULL B_Id)
以下是我的预期结果,尤其是在突出显示的行上:
A_Id | Weight A | Sum Weight A By B_Id | Weight B | B_Id | Accumulate Weight A By B_Id | Accumulate Weight B | Accumulative Diff |
---|---|---|---|---|---|---|---|
1 | 3 | 21 | 16 | 1 | 21 | 16 | 5 |
2 | 5 | 21 | 16 | 1 | 21 | 16 | 5 |
3 | 6 | 21 | 16 | 1 | 21 | 16 | 5 |
4 | 7 | 21 | 16 | 1 | 21 | 16 | 5 |
5 | 2 | 8 | 12 | 2 | 29 | 28 | 1 |
6 | 6 | 8 | 12 | 2 | 29 | 28 | 1 |
7 | 9 | 9 | 0 | NULL | 38 | 28 | 10 |
8 | 7 | 10 | 8 | 3 | 48 | 36 | 12 |
9 | 3 | 10 | 8 | 3 | 48 | 36 | 12 |
但是,对于我的示例查询,这不起作用。相反,出现以下内容:
NULL B_Id 出现在第一行。 (顺序乱了)
所以我的问题是,如何处理这种情况? (保留原始行与预期结果相同)
为什么顺序是这样的? (由@ThorstenKettner 提出)
默认顺序基于 B_TransactionDatetime
。如果 B_Id
为 NULL,则它将基于 A_TransactionDatetime
。因此,我计算了另一列 RefDateTime = COALESCE(B_TransactionDatetime, A_TransactionDatetime)
,并根据该列进行排序。
PS:
受@ThorstenKettner 的启发,我应该在 window 函数中使用 RefDateTime
,即:
[Sum Weight A By B_Id] = SUM(WeightA) OVER (ORDER BY RefDateTime),
[Accumulate Weight B] = SUM(CASE WHEN [Row By B_Id] = 1 THEN WeightB ELSE 0 END) OVER (ORDER BY RefDateTime)
案件结案。
您可以使用 coalesce()。
SELECT
*,
[Accumulate Weight A By B_Id] = SUM(WeightA) OVER (PARTITION BY B_id ORDER BY B_Id),
[Accumulate Weight B] = SUM(WeightB) OVER (PARTITION BY B_id ORDER BY B_Id),
SUM(coalesce(WeightA,0)-coalesce(WeightB,0)) OVER (PARTITION BY B_id ORDER BY B_Id) difference
FROM t2
PS:实际上你最初的查询在我看来是错误的,如果那是正确的那么就可以了。 也许你应该给出 A 和 B 的样本数据。对我来说,在加入它们之前先求和()更有意义。
您将不得不进行更改,但这应该会有所帮助。`
SELECT [Accumulate Weight A By B_Id] = SUM(WeightA) OVER (
PARTITION BY...ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(WeightB) OVER (
PARTITION BY...ORDER BY B_Id
)
FROM t2
WHERE B_Id IS NOT NULL
UNION
SELECT [Accumulate Weight A By B_Id] = SUM(TAB.WeightA) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
,[Accumulate Weight B] = SUM(TAB.WeightB) OVER (
PARTITION BY TAB.ROW_NUM ORDER BY B_Id
)
FROM (
SELECT WeightA
,WeightB
,B_Id
,ROW_NUMBER() OVER (
ORDER BY B_ID
) AS ROW_NUM
FROM T2
WHERE B_ID IS NULL
) TAB
`
您想将 B 外连接到 A,因为并非每个 A 都有关联的 B。
然后你逐块查看行。一个块要么是所有 rows.belonging 到一个 B,要么是没有 B 的单个 A 行。b_id 将成为前者的一个很好的组密钥,而 a_id 将适合后者。对于组合键,有不同的选项。 COALESCE(b_id, a_id)
不是其中之一,因为我们可以在结果集中有一个 a_id 1 和一个 b_id 1,但不希望它们在同一组中。一种解决方案是简单的 COALESCE(b_id, -a_id)
,当然前提是您的 ID 不能为负数。
现在,您所有的计算都基于聚合组,即当它们属于 B 组时,您对单个 A 值不感兴趣。出于这个原因,我会立即聚合并且只在最后再次加入单个 A 行。
行的顺序是 COALESCE(b_date, a_date)
。
with grouped as
(
select
coalesce(b.b_id, -a.a_id) as grp_id,
max(coalesce(b.date, a.date)) as grp_date,
coalesce(max(b.weight), 0) as b_weight,
sum(a.weight) as a_weight
from a
left join b on b.b_id =a.b_id
group by coalesce(b.b_id, -a.a_id)
)
, calculated as
(
select
grp_id,
grp_date,
b_weight,
a_weight,
sum(a_weight - b_weight) over (order by grp_date) as running_diff
from grouped
)
select *
from calculated c
join a on a.b_id = c.grp_id or a.a_id = -c.grp_id
order by c.grp_date, a.date;
希望一切顺利。我手边没有电脑,只能在手机上打字,结果比我想象的要难:-)