SQL 只筛选连续的数字
SQL Filter to only consecutive numbers
我有一个 table 是由 timestamp
订购的,我只想保留连续的 step
个订单(下面标有 *
)。
在命令式编程中,它将是:
prev_step = 0
output = []
for step in table.steps: # already sorted by timestamp
if step == prev_step + 1:
output.append(step) # desired row
prev_step = step
我拥有的原始 table(用 *
装饰的所需行,实际上不是数据):
| timestamp | step |
| --------- | ---- |
| 100000001 | 5 |
| 100000002 | 1 |*
| 100000003 | 1 | ^
| 100000004 | 2 |*
| 100000005 | 2 | ^
| 100000006 | 4 |
| 100000007 | 5 |
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000010 | 2 |
| 100000011 | 5 |*
| 100000012 | 7 |
我想要的:
| timestamp | step |
| --------- | ---- |
| 100000002 | 1 |*
| 100000004 | 2 |*
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000011 | 5 |*
我只想出了一个 WHERE step - LAG(step) OVER (ORDER BY timestamp) <> 0
,但它只会删除相邻的重复项(在上面的 ^
中标记)。它当然有帮助,但还不够。
提前致谢!
这是一个解决方案,它依赖于相关子查询来检测要在每个步骤中保留的正确记录。
WITH cte AS (
SELECT t1.*, (SELECT COUNT(DISTINCT t2.step) FROM yourTable t2
WHERE t2."timestamp" < t1."timestamp" AND t2.step < t1.step) AS cnt
FROM yourTable t1
),
cte2 AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY cnt ORDER BY step, "timestamp") rn
FROM cte t
)
SELECT t1."timestamp", t1.step, t1.rn, t1.cnt
FROM cte2 t1
WHERE rn = 1 AND (step = 1 OR EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.step = t1.step - 1))
ORDER BY "timestamp";
一种方法是递归 CTE。不幸的是,递归 CTE 是有限制的。因此,一种方法是按步进顺序生成通过数据的每条路径。然后为每一步选择最小的时间戳:
with cte(ts, step) as (
(select ts, step
from t
where step = 1
order by ts
fetch first 1 row only)
union all
select t.ts, t.step
from cte join
t
on t.ts >= cte.ts and t.step = cte.step + 1
)
select *
from (select cte.*,
row_number() over (partition by step order by ts) as seqnum
from cte
) cte
where seqnum = 1;
Here 是一个 db<>fiddle.
我有一个 table 是由 timestamp
订购的,我只想保留连续的 step
个订单(下面标有 *
)。
在命令式编程中,它将是:
prev_step = 0
output = []
for step in table.steps: # already sorted by timestamp
if step == prev_step + 1:
output.append(step) # desired row
prev_step = step
我拥有的原始 table(用 *
装饰的所需行,实际上不是数据):
| timestamp | step |
| --------- | ---- |
| 100000001 | 5 |
| 100000002 | 1 |*
| 100000003 | 1 | ^
| 100000004 | 2 |*
| 100000005 | 2 | ^
| 100000006 | 4 |
| 100000007 | 5 |
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000010 | 2 |
| 100000011 | 5 |*
| 100000012 | 7 |
我想要的:
| timestamp | step |
| --------- | ---- |
| 100000002 | 1 |*
| 100000004 | 2 |*
| 100000008 | 3 |*
| 100000009 | 4 |*
| 100000011 | 5 |*
我只想出了一个 WHERE step - LAG(step) OVER (ORDER BY timestamp) <> 0
,但它只会删除相邻的重复项(在上面的 ^
中标记)。它当然有帮助,但还不够。
提前致谢!
这是一个解决方案,它依赖于相关子查询来检测要在每个步骤中保留的正确记录。
WITH cte AS (
SELECT t1.*, (SELECT COUNT(DISTINCT t2.step) FROM yourTable t2
WHERE t2."timestamp" < t1."timestamp" AND t2.step < t1.step) AS cnt
FROM yourTable t1
),
cte2 AS (
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY cnt ORDER BY step, "timestamp") rn
FROM cte t
)
SELECT t1."timestamp", t1.step, t1.rn, t1.cnt
FROM cte2 t1
WHERE rn = 1 AND (step = 1 OR EXISTS (SELECT 1 FROM yourTable t2
WHERE t2.step = t1.step - 1))
ORDER BY "timestamp";
一种方法是递归 CTE。不幸的是,递归 CTE 是有限制的。因此,一种方法是按步进顺序生成通过数据的每条路径。然后为每一步选择最小的时间戳:
with cte(ts, step) as (
(select ts, step
from t
where step = 1
order by ts
fetch first 1 row only)
union all
select t.ts, t.step
from cte join
t
on t.ts >= cte.ts and t.step = cte.step + 1
)
select *
from (select cte.*,
row_number() over (partition by step order by ts) as seqnum
from cte
) cte
where seqnum = 1;
Here 是一个 db<>fiddle.