如何在某个值差距之前识别每组行?
How to identify rows per group before a certain value gap?
我想根据 PostgreSQL 中相邻行之间另一列值的差异更新 table 中的特定列。
这是一个测试设置:
CREATE TABLE test(
main INTEGER,
sub_id INTEGER,
value_t INTEGER);
INSERT INTO test (main, sub_id, value_t)
VALUES
(1,1,8),
(1,2,7),
(1,3,3),
(1,4,85),
(1,5,40),
(2,1,3),
(2,2,1),
(2,3,1),
(2,4,8),
(2,5,41);
我的目标是通过检查升序来确定从 sub_id 1
开始的每个组 main
中 diff
中的哪个值超过特定阈值(例如 <10 或 >-10)按 sub_id
排序。在达到阈值之前,我想标记每个通过的行 AND 条件为 FALSE
的一行,方法是在 newval
列中填充一个值,例如1
.
我应该使用循环还是有更智能的解决方案?
伪代码中的任务描述:
FOR i in GROUP [PARTITION BY main ORDER BY sub_id]:
DO until diff > 10 OR diff <-10
SET newval = 1 AND LEAD(newval) = 1
聚合子查询上的 EXISTS:
UPDATE test u
SET value_t = NULL
WHERE EXISTS (
SELECT * FROM (
SELECT main,sub_id
, value_t , ABS(value_t - lag(value_t)
OVER (PARTITION BY main ORDER BY sub_id) ) AS absdiff
FROM test
) x
WHERE x.main = u.main
AND x.sub_id <= u.sub_id
AND x.absdiff >= 10
)
;
SELECT * FROM test
ORDER BY main, sub_id;
结果:
UPDATE 3
main | sub_id | value_t
------+--------+---------
1 | 1 | 8
1 | 2 | 7
1 | 3 | 3
1 | 4 |
1 | 5 |
2 | 1 | 3
2 | 2 | 1
2 | 3 | 1
2 | 4 | 8
2 | 5 |
(10 rows)
您的问题很难理解,“value_t”栏与问题无关,您忘记在 SQL 中定义“diff”栏。
无论如何,这是您的解决方案:
WITH data AS (
SELECT main, sub_id, value_t
, abs(value_t
- lead(value_t) OVER (PARTITION BY main ORDER BY sub_id)) > 10 is_evil
FROM test
)
SELECT main, sub_id, value_t
, CASE max(is_evil::int)
OVER (PARTITION BY main ORDER BY sub_id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
WHEN 1 THEN NULL ELSE 1 END newval
FROM data;
我正在使用 CTE 准备数据(计算一行是否为“邪恶”),然后使用“max”window 函数检查是否有任何“邪恶”行在当前分区之前,每个分区。
基本SELECT
尽快:
SELECT *, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT *, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub;
细点
您的思维模型围绕 window 函数发展 lead()
. But its counterpart lag()
就此目的而言效率更高一些,因为包含该行时不会出现差一错误在巨大的差距之前。 或者,使用 lead()
和反向排序 (ORDER BY sub_id DESC
)。
为了避免分区中的第一行出现 NULL
,提供 value_t
作为默认的第三个参数,这使得 diff 0
而不是 NULL。 lead()
和 lag()
都具有这种能力。
diff BETWEEN -10 AND 10
比 @diff < 11
稍快(也更清晰、更灵活)。 (@
being the "absolute value" operator, equivalent to the abs()
function.)
外部 window 函数中的 bool_or()
or bool_and()
可能最便宜地标记所有行直至大间隙。
你的UPDATE
Until the threshold is reached I would like to flag every passed row AND the one row where the condition is FALSE
by filling column newval
with a value e.g. 1
.
再次,尽快。
UPDATE test AS t
SET newval = 1
FROM (
SELECT main, sub_id
, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT main, sub_id
, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub
) u
WHERE (t.main, t.sub_id) = (u.main, u.sub_id)
AND u.flag;
细点
在单个查询中计算所有值通常比相关子查询快得多。
添加的 WHERE 条件 AND u.flag
确保我们只更新实际需要更新的行。
如果某些行可能已经在 newval
中具有正确的值,请添加另一个子句以避免这些空更新:AND t.newval IS DISTINCT FROM 1
参见:
- How do I (or can I) SELECT DISTINCT on multiple columns?
SET newval = 1
分配一个常量(尽管在这种情况下我们可以使用实际计算的值),这会更便宜一些。
db<>fiddle here
我想根据 PostgreSQL 中相邻行之间另一列值的差异更新 table 中的特定列。
这是一个测试设置:
CREATE TABLE test(
main INTEGER,
sub_id INTEGER,
value_t INTEGER);
INSERT INTO test (main, sub_id, value_t)
VALUES
(1,1,8),
(1,2,7),
(1,3,3),
(1,4,85),
(1,5,40),
(2,1,3),
(2,2,1),
(2,3,1),
(2,4,8),
(2,5,41);
我的目标是通过检查升序来确定从 sub_id 1
开始的每个组 main
中 diff
中的哪个值超过特定阈值(例如 <10 或 >-10)按 sub_id
排序。在达到阈值之前,我想标记每个通过的行 AND 条件为 FALSE
的一行,方法是在 newval
列中填充一个值,例如1
.
我应该使用循环还是有更智能的解决方案?
伪代码中的任务描述:
FOR i in GROUP [PARTITION BY main ORDER BY sub_id]:
DO until diff > 10 OR diff <-10
SET newval = 1 AND LEAD(newval) = 1
聚合子查询上的 EXISTS:
UPDATE test u
SET value_t = NULL
WHERE EXISTS (
SELECT * FROM (
SELECT main,sub_id
, value_t , ABS(value_t - lag(value_t)
OVER (PARTITION BY main ORDER BY sub_id) ) AS absdiff
FROM test
) x
WHERE x.main = u.main
AND x.sub_id <= u.sub_id
AND x.absdiff >= 10
)
;
SELECT * FROM test
ORDER BY main, sub_id;
结果:
UPDATE 3
main | sub_id | value_t
------+--------+---------
1 | 1 | 8
1 | 2 | 7
1 | 3 | 3
1 | 4 |
1 | 5 |
2 | 1 | 3
2 | 2 | 1
2 | 3 | 1
2 | 4 | 8
2 | 5 |
(10 rows)
您的问题很难理解,“value_t”栏与问题无关,您忘记在 SQL 中定义“diff”栏。
无论如何,这是您的解决方案:
WITH data AS (
SELECT main, sub_id, value_t
, abs(value_t
- lead(value_t) OVER (PARTITION BY main ORDER BY sub_id)) > 10 is_evil
FROM test
)
SELECT main, sub_id, value_t
, CASE max(is_evil::int)
OVER (PARTITION BY main ORDER BY sub_id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
WHEN 1 THEN NULL ELSE 1 END newval
FROM data;
我正在使用 CTE 准备数据(计算一行是否为“邪恶”),然后使用“max”window 函数检查是否有任何“邪恶”行在当前分区之前,每个分区。
基本SELECT
尽快:
SELECT *, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT *, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub;
细点
您的思维模型围绕 window 函数发展
lead()
. But its counterpartlag()
就此目的而言效率更高一些,因为包含该行时不会出现差一错误在巨大的差距之前。 或者,使用lead()
和反向排序 (ORDER BY sub_id DESC
)。为了避免分区中的第一行出现
NULL
,提供value_t
作为默认的第三个参数,这使得 diff0
而不是 NULL。lead()
和lag()
都具有这种能力。diff BETWEEN -10 AND 10
比@diff < 11
稍快(也更清晰、更灵活)。 (@
being the "absolute value" operator, equivalent to theabs()
function.)
外部 window 函数中的 bool_or()
orbool_and()
可能最便宜地标记所有行直至大间隙。
你的UPDATE
Until the threshold is reached I would like to flag every passed row AND the one row where the condition is
FALSE
by filling columnnewval
with a value e.g.1
.
再次,尽快。
UPDATE test AS t
SET newval = 1
FROM (
SELECT main, sub_id
, bool_and(diff BETWEEN -10 AND 10) OVER (PARTITION BY main ORDER BY sub_id) AS flag
FROM (
SELECT main, sub_id
, value_t - lag(value_t, 1, value_t) OVER (PARTITION BY main ORDER BY sub_id) AS diff
FROM test
) sub
) u
WHERE (t.main, t.sub_id) = (u.main, u.sub_id)
AND u.flag;
细点
在单个查询中计算所有值通常比相关子查询快得多。
添加的 WHERE 条件
AND u.flag
确保我们只更新实际需要更新的行。
如果某些行可能已经在newval
中具有正确的值,请添加另一个子句以避免这些空更新:AND t.newval IS DISTINCT FROM 1
参见:- How do I (or can I) SELECT DISTINCT on multiple columns?
SET newval = 1
分配一个常量(尽管在这种情况下我们可以使用实际计算的值),这会更便宜一些。
db<>fiddle here