如何在 SQL 服务器中循环数据并将前一批次与当前批次进行比较?
How to loop through data and compare previous batch to current batch in SQL Server?
每次我们收到一批新数据时,我们都会将其加载到一个 table 中,该 table 存储我们之前收到的每个数据集。每个新批次都有一个新的 batch_id
。仅有的另外两列是 item_id
和 cost
。我想构建一个报告,描述添加了多少新项目(基于 item_id)、删除了多少项目以及最新批次中项目的净变化。考虑到这一点,我想将所有批次相互比较,但只有更高的 batch_id
比较它之前的 batch_id
。澄清一下,如果我有 3 个 batch_id
,那么我希望第 3 批次与第 2 批次进行比较,第 2 批次与第 1 批次进行比较。第 1 批次没有什么可比较的。我可以使用下面的脚本执行此操作,但仅限于我指定的批次(结果中的第二行),并且对于数百个批次,这会让人感到麻木。有没有办法在 SQL 服务器中使它更加动态?我在下面提供了一个示例数据集和预期结果。
WITH b1
AS (SELECT *
FROM example_cumulative
WHERE batch_id = 2),
b2
AS (SELECT *
FROM example_cumulative
WHERE batch_id = 3)
SELECT SUM(CASE
WHEN b1.item_id IS NULL
THEN 1
ELSE 0
END) [new items],
SUM(CASE
WHEN b2.item_id IS NULL
THEN 1
ELSE 0
END) [removed items],
COUNT(b2.item_id) - COUNT(b1.item_id) [net change in item volume]
FROM b1
FULL OUTER JOIN b2 ON b1.item_id = b2.item_id;
预期结果
CREATE TABLE example_cumulative
(batch_id INT NOT NULL,
item_id INT NOT NULL,
cost FLOAT NOT NULL
);
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,10,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,12,106)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,13,142)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,14,152)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,15,154)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,11,140)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,12,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,16,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,16,195)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,17,102)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,18,101)
您可以通过动态加入连续的 batch_id
来扩展当前的 full join
逻辑,如下所示:
select
coalesce(e1.batch_id, e0.batch_id + 1) batch_id,
sum(case when e0.item_id is null then 1 else 0 end) new_items,
sum(case when e1.item_id is null then 1 else 0 end) removed_items,
count(e1.item_id) - count(e0.item_id) net_change
from example_cumulative e1
full join example_cumulative e0
on e1.item_id = e0.item_id
and e1.batch_id = e0.batch_id + 1
where coalesce(e1.batch_id, e0.batch_id + 1) <= (select max(batch_id) from example_cumulative)
group by coalesce(e1.batch_id, e0.batch_id + 1)
where
子句用于防止查询 return 一个额外的、不需要的记录,其 batch_id
大于最大可用值。
对于您的样本数据,this returns:
batch_id | new_items | removed_items | net_change
-------: | --------: | ------------: | ---------:
1 | 6 | 0 | 6
2 | 1 | 3 | -2
3 | 2 | 1 | 1
每次我们收到一批新数据时,我们都会将其加载到一个 table 中,该 table 存储我们之前收到的每个数据集。每个新批次都有一个新的 batch_id
。仅有的另外两列是 item_id
和 cost
。我想构建一个报告,描述添加了多少新项目(基于 item_id)、删除了多少项目以及最新批次中项目的净变化。考虑到这一点,我想将所有批次相互比较,但只有更高的 batch_id
比较它之前的 batch_id
。澄清一下,如果我有 3 个 batch_id
,那么我希望第 3 批次与第 2 批次进行比较,第 2 批次与第 1 批次进行比较。第 1 批次没有什么可比较的。我可以使用下面的脚本执行此操作,但仅限于我指定的批次(结果中的第二行),并且对于数百个批次,这会让人感到麻木。有没有办法在 SQL 服务器中使它更加动态?我在下面提供了一个示例数据集和预期结果。
WITH b1
AS (SELECT *
FROM example_cumulative
WHERE batch_id = 2),
b2
AS (SELECT *
FROM example_cumulative
WHERE batch_id = 3)
SELECT SUM(CASE
WHEN b1.item_id IS NULL
THEN 1
ELSE 0
END) [new items],
SUM(CASE
WHEN b2.item_id IS NULL
THEN 1
ELSE 0
END) [removed items],
COUNT(b2.item_id) - COUNT(b1.item_id) [net change in item volume]
FROM b1
FULL OUTER JOIN b2 ON b1.item_id = b2.item_id;
预期结果
CREATE TABLE example_cumulative
(batch_id INT NOT NULL,
item_id INT NOT NULL,
cost FLOAT NOT NULL
);
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,10,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,12,106)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,13,142)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,14,152)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (1,15,154)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,11,140)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,12,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (2,16,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,10,110)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,11,100)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,16,195)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,17,102)
INSERT INTO example_cumulative (batch_id, item_id, cost) VALUES (3,18,101)
您可以通过动态加入连续的 batch_id
来扩展当前的 full join
逻辑,如下所示:
select
coalesce(e1.batch_id, e0.batch_id + 1) batch_id,
sum(case when e0.item_id is null then 1 else 0 end) new_items,
sum(case when e1.item_id is null then 1 else 0 end) removed_items,
count(e1.item_id) - count(e0.item_id) net_change
from example_cumulative e1
full join example_cumulative e0
on e1.item_id = e0.item_id
and e1.batch_id = e0.batch_id + 1
where coalesce(e1.batch_id, e0.batch_id + 1) <= (select max(batch_id) from example_cumulative)
group by coalesce(e1.batch_id, e0.batch_id + 1)
where
子句用于防止查询 return 一个额外的、不需要的记录,其 batch_id
大于最大可用值。
对于您的样本数据,this returns:
batch_id | new_items | removed_items | net_change -------: | --------: | ------------: | ---------: 1 | 6 | 0 | 6 2 | 1 | 3 | -2 3 | 2 | 1 | 1