当组归属取决于前一行时,如何在 postgresql 中标记组?
How to label groups in postgresql when group belonging depends on the preceding line?
我想在请求中用最后一个已知值填充所有 Null 值。
当它在 table 而不是在请求中时,很简单:
如果我定义并填写我的table如下:
CREATE TABLE test_fill_null (
date INTEGER,
value INTEGER
);
INSERT INTO test_fill_null VALUES
(1,2),
(2, NULL),
(3, 45),
(4,NULL),
(5, null);
SELECT * FROM test_fill_null ;
date | value
------+-------
1 | 2
2 |
3 | 45
4 |
5 |
那我就这样填:
UPDATE test_fill_null t1
SET value = (
SELECT t2.value
FROM test_fill_null t2
WHERE t2.date <= t1.date AND value IS NOT NULL
ORDER BY t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
date | value
------+-------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
但是现在,我有一个请求,比如这个:
WITH
pre_table AS(
SELECT
id1,
id2,
tms,
CASE
WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
ELSE id2
END as group_id
FROM
table0
window w as (partition by id1 order by tms)
)
其中group_id当上一个点距离超过5分钟时设置为id2,否则为null。通过这样做,我想最终得到一组点,彼此相隔不到 5 分钟,每组之间的间隔超过 5 分钟。
那我就不知道怎么办了。我试过了:
SELECT distinct on (id1, id2)
t0.id1,
t0.id2,
t0.tms,
t1.group_id
FROM
pre_table t0
LEFT JOIN (
select
id1,
tms,
group_id
from pre_table t2
where t2.group_id is not null
order by tms desc
) t1
ON
t1.tms <= t0.tms AND
t1.id1 = t0.id1
WHERE
t0.id1 IS NOT NULL
ORDER BY
id1,
id2,
t1.tms DESC
但在最终结果中,我有一些连续两个点距离超过 5 分钟的组。在这种情况下,他们应该是两个不同的组。
在编辑我的问题时,我找到了解决方案。不过它很低,比我在 table 中的示例低得多。有什么改进建议吗?
SELECT
t2.id1,
t2.id2,
t2.tms,
(
SELECT t1.group_id
FROM pre_table t1
WHERE
t1.tms <= t2.tms
AND t1.group_id IS NOT NULL
AND t2.id1 = t2.id1
ORDER BY t1.tms DESC
LIMIT 1
) as group_id
FROM
pre_table t2
ORDER BY
t2.id1
t2.id2
t2.tms
正如我所说,select select
A "select within a select" 通常称为 "subselect" 或 "subquery" 在您的特定情况下,它是一个 相关子查询。 LATERAL
联接(postgres 9.3 中的新功能)可以在很大程度上用更灵活的解决方案替换相关子查询:
我认为你在这里也不需要。
对于您的第一种情况,此查询可能更快更简单,但是:
SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM (
SELECT *, count(value) OVER (ORDER BY date) AS grp
FROM test_fill_null
) sub;
count()
仅计算非空值,因此 grp
随每个非空值 value
递增,从而根据需要形成组。在外部 SELECT
.
中每个 grp
选择 one 非空 value
是微不足道的
对于您的第二种情况,我假设行的初始顺序由您的一个查询所指示的(id1, id2, tms)
决定。
SELECT id1, id2, tms
, count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM (
SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
< tms - interval '5 min'
THEN true END AS step
FROM table0
) sub
ORDER BY id1, id2, tms;
以您的实际订单为准。其中之一可能涵盖它:
PARTITION BY id1 ORDER BY id2 -- ignore tms
PARTITION BY id1 ORDER BY tms -- ignore id2
SQL Fiddle 扩展示例。
相关:
我想在请求中用最后一个已知值填充所有 Null 值。 当它在 table 而不是在请求中时,很简单:
如果我定义并填写我的table如下:
CREATE TABLE test_fill_null (
date INTEGER,
value INTEGER
);
INSERT INTO test_fill_null VALUES
(1,2),
(2, NULL),
(3, 45),
(4,NULL),
(5, null);
SELECT * FROM test_fill_null ;
date | value
------+-------
1 | 2
2 |
3 | 45
4 |
5 |
那我就这样填:
UPDATE test_fill_null t1
SET value = (
SELECT t2.value
FROM test_fill_null t2
WHERE t2.date <= t1.date AND value IS NOT NULL
ORDER BY t2.date DESC
LIMIT 1
);
SELECT * FROM test_fill_null;
date | value
------+-------
1 | 2
2 | 2
3 | 45
4 | 45
5 | 45
但是现在,我有一个请求,比如这个:
WITH
pre_table AS(
SELECT
id1,
id2,
tms,
CASE
WHEN tms - lag(tms) over w < interval '5 minutes' THEN NULL
ELSE id2
END as group_id
FROM
table0
window w as (partition by id1 order by tms)
)
其中group_id当上一个点距离超过5分钟时设置为id2,否则为null。通过这样做,我想最终得到一组点,彼此相隔不到 5 分钟,每组之间的间隔超过 5 分钟。
那我就不知道怎么办了。我试过了:
SELECT distinct on (id1, id2)
t0.id1,
t0.id2,
t0.tms,
t1.group_id
FROM
pre_table t0
LEFT JOIN (
select
id1,
tms,
group_id
from pre_table t2
where t2.group_id is not null
order by tms desc
) t1
ON
t1.tms <= t0.tms AND
t1.id1 = t0.id1
WHERE
t0.id1 IS NOT NULL
ORDER BY
id1,
id2,
t1.tms DESC
但在最终结果中,我有一些连续两个点距离超过 5 分钟的组。在这种情况下,他们应该是两个不同的组。
在编辑我的问题时,我找到了解决方案。不过它很低,比我在 table 中的示例低得多。有什么改进建议吗?
SELECT
t2.id1,
t2.id2,
t2.tms,
(
SELECT t1.group_id
FROM pre_table t1
WHERE
t1.tms <= t2.tms
AND t1.group_id IS NOT NULL
AND t2.id1 = t2.id1
ORDER BY t1.tms DESC
LIMIT 1
) as group_id
FROM
pre_table t2
ORDER BY
t2.id1
t2.id2
t2.tms
正如我所说,select select
A "select within a select" 通常称为 "subselect" 或 "subquery" 在您的特定情况下,它是一个 相关子查询。 LATERAL
联接(postgres 9.3 中的新功能)可以在很大程度上用更灵活的解决方案替换相关子查询:
我认为你在这里也不需要。
对于您的第一种情况,此查询可能更快更简单,但是:
SELECT date, max(value) OVER (PARTITION BY grp) AS value
FROM (
SELECT *, count(value) OVER (ORDER BY date) AS grp
FROM test_fill_null
) sub;
count()
仅计算非空值,因此 grp
随每个非空值 value
递增,从而根据需要形成组。在外部 SELECT
.
grp
选择 one 非空 value
是微不足道的
对于您的第二种情况,我假设行的初始顺序由您的一个查询所指示的(id1, id2, tms)
决定。
SELECT id1, id2, tms
, count(step) OVER (ORDER BY id1, id2, tms) AS group_id
FROM (
SELECT *, CASE WHEN lag(tms, 1, '-infinity') OVER (PARTITION BY id1 ORDER BY id2, tms)
< tms - interval '5 min'
THEN true END AS step
FROM table0
) sub
ORDER BY id1, id2, tms;
以您的实际订单为准。其中之一可能涵盖它:
PARTITION BY id1 ORDER BY id2 -- ignore tms
PARTITION BY id1 ORDER BY tms -- ignore id2
SQL Fiddle 扩展示例。
相关: