将总计行的值拆分为其他多个行,直到总和达到 REDSHIFT 中总计行的值
Split value from a total row to multiple other rows until the sum reaches the value of the total row in REDSHIFT
CREATE TABLE inbound (
id SERIAL PRIMARY KEY,
campaign VARCHAR,
expected_inbound_date DATE,
expected_inbound_quantity DECIMAL,
received_inbound_quantity DECIMAL
);
INSERT INTO inbound
(campaign, expected_inbound_date, expected_inbound_quantity, received_inbound_quantity)
VALUES
('C001', '2022-05-03', '500', '0'),
('C001', '2022-05-03', '800', '0'),
('C001', '2022-05-03', '400', '0'),
('C001', '2022-05-03', '200', '0'),
('C001', NULL, '0', '700'),
('C002', '2022-08-20', '3000', '0'),
('C002', '2022-08-20', '5000', '0'),
('C002', '2022-08-20', '2800', '0'),
('C002', NULL, '0', '4000');
预期结果
campaign | expected_inbound_date | expected_inbound_quantity | split_received_inbound_quantity
---------|------------------------|-----------------------------|----------------------------------
C001 | 2022-05-03 | 200 | 200
C001 | 2022-05-03 | 400 | 400
C001 | 2022-05-03 | 500 | 100
C001 | 2022-05-03 | 800 | 0
C001 | | | 700
---------|------------------------|-----------------------------|----------------------------------
C002 | 2022-08-20 | 3.800 | 3.800
C002 | 2022-08-20 | 5.000 | 200
C002 | 2022-08-20 | 2.800 | 0
C002 | | | 4.000
我想将 received_inbound_quantity
拆分到 expected_inbound_quantity
的每一行,直到达到 received_inbound_quantity
的总数。
参考 中的答案,我尝试采用此解决方案:
SELECT
i.campaign AS campaign,
i.expected_inbound_date AS expected_inbound_date,
i.expected_inbound_quantity AS expected_inbound_quantity,
i.received_inbound_quantity AS received_inbound_quantity,
(SELECT
GREATEST(
LEAST(i.expected_inbound_quantity,
(SELECT
SUM(i3.received_inbound_quantity)
FROM inbound i3
WHERE i.campaign = i3.campaign) -
(
SELECT
t1.cumulated_value AS cumulated_value
FROM
(SELECT
i2.campaign,
i2.expected_inbound_date,
i2.expected_inbound_quantity,
i2.received_inbound_quantity,
SUM(i2.expected_inbound_quantity) OVER (PARTITION BY i2.campaign ORDER BY i2.expected_inbound_date, i2.expected_inbound_quantity, i2.received_inbound_quantity ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS cumulated_value
FROM inbound i2
GROUP BY 1,2,3,4) t1
WHERE (t1.campaign, t1.expected_inbound_date, t1.expected_inbound_quantity, t1.received_inbound_quantity) = (i.campaign, i.expected_inbound_date, i.expected_inbound_quantity, i.received_inbound_quantity)
)
),
0
)
) AS split
FROM inbound i
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4
然而,在 redshift 中我得到错误:
Invalid operation: This type of correlated subquery pattern is not supported yet;
我需要如何修改查询才能使其在 redshift 中也能正常工作?
Window 函数是你的朋友。当您有比较行的查询时,您应该首先查看 Redshift 上的 window 函数。这比任何自连接模式都更简单、更干净、更快。
select
campaign,
expected_inbound_date,
expected_inbound_quantity,
received_inbound_quantity,
case when (inbound_total - inbound_sum) >= 0 then expected_inbound_quantity
else case when (expected_inbound_quantity + inbound_total - inbound_sum) >= 0 then expected_inbound_quantity + inbound_total - inbound_sum
else 0 end
end as split
from (SELECT
campaign,
expected_inbound_date,
expected_inbound_quantity,
received_inbound_quantity,
sum(expected_inbound_quantity) over (partition by campaign order by expected_inbound_date, expected_inbound_quantity) as inbound_sum,
max(received_inbound_quantity) over (partition by campaign) as inbound_total
FROM inbound i
) subq
ORDER BY 1,2,3,4;
已更新 fiddle 此处 - https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2381abdf5a90a997a4f05b809c892c40
将其移植到 Redshift 时,您可能希望将 CASE 语句转换为 DECODE() 函数,因为恕我直言,这些函数更具可读性。
PS。感谢您设置 fiddle,因为这大大加快了提供答案的速度。
CREATE TABLE inbound (
id SERIAL PRIMARY KEY,
campaign VARCHAR,
expected_inbound_date DATE,
expected_inbound_quantity DECIMAL,
received_inbound_quantity DECIMAL
);
INSERT INTO inbound
(campaign, expected_inbound_date, expected_inbound_quantity, received_inbound_quantity)
VALUES
('C001', '2022-05-03', '500', '0'),
('C001', '2022-05-03', '800', '0'),
('C001', '2022-05-03', '400', '0'),
('C001', '2022-05-03', '200', '0'),
('C001', NULL, '0', '700'),
('C002', '2022-08-20', '3000', '0'),
('C002', '2022-08-20', '5000', '0'),
('C002', '2022-08-20', '2800', '0'),
('C002', NULL, '0', '4000');
预期结果
campaign | expected_inbound_date | expected_inbound_quantity | split_received_inbound_quantity
---------|------------------------|-----------------------------|----------------------------------
C001 | 2022-05-03 | 200 | 200
C001 | 2022-05-03 | 400 | 400
C001 | 2022-05-03 | 500 | 100
C001 | 2022-05-03 | 800 | 0
C001 | | | 700
---------|------------------------|-----------------------------|----------------------------------
C002 | 2022-08-20 | 3.800 | 3.800
C002 | 2022-08-20 | 5.000 | 200
C002 | 2022-08-20 | 2.800 | 0
C002 | | | 4.000
我想将 received_inbound_quantity
拆分到 expected_inbound_quantity
的每一行,直到达到 received_inbound_quantity
的总数。
参考
SELECT
i.campaign AS campaign,
i.expected_inbound_date AS expected_inbound_date,
i.expected_inbound_quantity AS expected_inbound_quantity,
i.received_inbound_quantity AS received_inbound_quantity,
(SELECT
GREATEST(
LEAST(i.expected_inbound_quantity,
(SELECT
SUM(i3.received_inbound_quantity)
FROM inbound i3
WHERE i.campaign = i3.campaign) -
(
SELECT
t1.cumulated_value AS cumulated_value
FROM
(SELECT
i2.campaign,
i2.expected_inbound_date,
i2.expected_inbound_quantity,
i2.received_inbound_quantity,
SUM(i2.expected_inbound_quantity) OVER (PARTITION BY i2.campaign ORDER BY i2.expected_inbound_date, i2.expected_inbound_quantity, i2.received_inbound_quantity ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS cumulated_value
FROM inbound i2
GROUP BY 1,2,3,4) t1
WHERE (t1.campaign, t1.expected_inbound_date, t1.expected_inbound_quantity, t1.received_inbound_quantity) = (i.campaign, i.expected_inbound_date, i.expected_inbound_quantity, i.received_inbound_quantity)
)
),
0
)
) AS split
FROM inbound i
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4
然而,在 redshift 中我得到错误:
Invalid operation: This type of correlated subquery pattern is not supported yet;
我需要如何修改查询才能使其在 redshift 中也能正常工作?
Window 函数是你的朋友。当您有比较行的查询时,您应该首先查看 Redshift 上的 window 函数。这比任何自连接模式都更简单、更干净、更快。
select
campaign,
expected_inbound_date,
expected_inbound_quantity,
received_inbound_quantity,
case when (inbound_total - inbound_sum) >= 0 then expected_inbound_quantity
else case when (expected_inbound_quantity + inbound_total - inbound_sum) >= 0 then expected_inbound_quantity + inbound_total - inbound_sum
else 0 end
end as split
from (SELECT
campaign,
expected_inbound_date,
expected_inbound_quantity,
received_inbound_quantity,
sum(expected_inbound_quantity) over (partition by campaign order by expected_inbound_date, expected_inbound_quantity) as inbound_sum,
max(received_inbound_quantity) over (partition by campaign) as inbound_total
FROM inbound i
) subq
ORDER BY 1,2,3,4;
已更新 fiddle 此处 - https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2381abdf5a90a997a4f05b809c892c40
将其移植到 Redshift 时,您可能希望将 CASE 语句转换为 DECODE() 函数,因为恕我直言,这些函数更具可读性。
PS。感谢您设置 fiddle,因为这大大加快了提供答案的速度。