Postgres连续天数,差距和岛屿,Tabibitosan
Postgres Consecutive Days, gaps and islands, Tabibitosan
我有以下数据库table:
date
name
2014-08-10
bob
2014-08-10
sue
2014-08-11
bob
2014-08-11
mike
2014-08-12
bob
2014-08-12
mike
2014-08-05
bob
2014-08-06
bob
SELECT t.Name,COUNT(*) as frequency
FROM (
SELECT Name,Date,
row_number() OVER (
ORDER BY Date
) - row_number() OVER (
PARTITION BY Name ORDER BY Date
) + 1 seq
FROM orders
) t
GROUP BY Name,seq;
尝试了 运行 Tabibitosan 寻找间隙和岛的方法产生了下面的 table,这是 不正确的 .由于第 11 天和第 12 天是连续的,因此名称“mike”实际上应该计数为 2。我该如何解决这个问题?
name
frequency
mike
1
bob
3
bob
2
mike
1
sue
1
更正以下预期输出:
name
frequency
bob
3
bob
2
mike
2
sue
1
你使用了错误的逻辑。基本上,您想要连续的日期,因此您想要从日期 :
中减去序列
SELECT t.Name, COUNT(*) as frequency
FROM (SELECT o.*,
row_number() OVER (PARTITION BY Name ORDER BY Date) as seqnum
FROM orders o
) t
GROUP BY Name, date - seqnum * interval '1 day';
Here 是一个 db<>fiddle.
在 Postgresql 中解决了 Gaps and Islands 问题:
运行 这个工作演示示例:
drop table if exists foobar;
CREATE TABLE foobar( tick text, date_val date );
insert into foobar values('XYZ', '2021-01-03'); --island 1 has width 2
insert into foobar values('XYZ', '2021-01-04'); --island 1
insert into foobar values('XYZ', '2021-05-09'); --island 2 has width 3
insert into foobar values('XYZ', '2021-05-10'); --island 2
insert into foobar values('XYZ', '2021-05-11'); --island 2
insert into foobar values('XYZ', '2021-07-07'); --island 3 has width 4
insert into foobar values('XYZ', '2021-07-08'); --island 3
insert into foobar values('XYZ', '2021-07-09'); --island 3
insert into foobar values('XYZ', '2021-07-10'); --island 3
insert into foobar values('XYZ', '2022-10-10'); --island 4 has width 1
select tick, island_width, min_val, max_val,
min_val - lag(max_val) over (order by max_val)
as gap_width from
(
select tick, count(*) as island_width,
min(date_val) min_val, max(date_val) max_val
from (
select t.*,
row_number() over ( partition by tick order by date_val ) as seqnum
from foobar t where tick = 'XYZ'
) t
group by tick, date_val - seqnum * interval '1 day'
) t2 order by max_val desc
打印:
┌──────┬──────────────┬────────────┬────────────┬───────────┐
│ tick │ island_width │ min_val │ max_val │ gap_width │
├──────┼──────────────┼────────────┼────────────┼───────────┤
│ XYZ │ 1 │ 2022-10-10 │ 2022-10-10 │ 457 │
│ XYZ │ 4 │ 2021-07-07 │ 2021-07-10 │ 57 │
│ XYZ │ 3 │ 2021-05-09 │ 2021-05-11 │ 125 │
│ XYZ │ 2 │ 2021-01-03 │ 2021-01-04 │ ¤ │
└──────┴──────────────┴────────────┴────────────┴───────────┘
第island_width
列给出了连续数据的宽度。 gap_width 给出缺失数据的宽度。
我有以下数据库table:
date | name |
---|---|
2014-08-10 | bob |
2014-08-10 | sue |
2014-08-11 | bob |
2014-08-11 | mike |
2014-08-12 | bob |
2014-08-12 | mike |
2014-08-05 | bob |
2014-08-06 | bob |
SELECT t.Name,COUNT(*) as frequency
FROM (
SELECT Name,Date,
row_number() OVER (
ORDER BY Date
) - row_number() OVER (
PARTITION BY Name ORDER BY Date
) + 1 seq
FROM orders
) t
GROUP BY Name,seq;
尝试了 运行 Tabibitosan 寻找间隙和岛的方法产生了下面的 table,这是 不正确的 .由于第 11 天和第 12 天是连续的,因此名称“mike”实际上应该计数为 2。我该如何解决这个问题?
name | frequency |
---|---|
mike | 1 |
bob | 3 |
bob | 2 |
mike | 1 |
sue | 1 |
更正以下预期输出:
name | frequency |
---|---|
bob | 3 |
bob | 2 |
mike | 2 |
sue | 1 |
你使用了错误的逻辑。基本上,您想要连续的日期,因此您想要从日期 :
中减去序列SELECT t.Name, COUNT(*) as frequency
FROM (SELECT o.*,
row_number() OVER (PARTITION BY Name ORDER BY Date) as seqnum
FROM orders o
) t
GROUP BY Name, date - seqnum * interval '1 day';
Here 是一个 db<>fiddle.
在 Postgresql 中解决了 Gaps and Islands 问题:
运行 这个工作演示示例:
drop table if exists foobar;
CREATE TABLE foobar( tick text, date_val date );
insert into foobar values('XYZ', '2021-01-03'); --island 1 has width 2
insert into foobar values('XYZ', '2021-01-04'); --island 1
insert into foobar values('XYZ', '2021-05-09'); --island 2 has width 3
insert into foobar values('XYZ', '2021-05-10'); --island 2
insert into foobar values('XYZ', '2021-05-11'); --island 2
insert into foobar values('XYZ', '2021-07-07'); --island 3 has width 4
insert into foobar values('XYZ', '2021-07-08'); --island 3
insert into foobar values('XYZ', '2021-07-09'); --island 3
insert into foobar values('XYZ', '2021-07-10'); --island 3
insert into foobar values('XYZ', '2022-10-10'); --island 4 has width 1
select tick, island_width, min_val, max_val,
min_val - lag(max_val) over (order by max_val)
as gap_width from
(
select tick, count(*) as island_width,
min(date_val) min_val, max(date_val) max_val
from (
select t.*,
row_number() over ( partition by tick order by date_val ) as seqnum
from foobar t where tick = 'XYZ'
) t
group by tick, date_val - seqnum * interval '1 day'
) t2 order by max_val desc
打印:
┌──────┬──────────────┬────────────┬────────────┬───────────┐
│ tick │ island_width │ min_val │ max_val │ gap_width │
├──────┼──────────────┼────────────┼────────────┼───────────┤
│ XYZ │ 1 │ 2022-10-10 │ 2022-10-10 │ 457 │
│ XYZ │ 4 │ 2021-07-07 │ 2021-07-10 │ 57 │
│ XYZ │ 3 │ 2021-05-09 │ 2021-05-11 │ 125 │
│ XYZ │ 2 │ 2021-01-03 │ 2021-01-04 │ ¤ │
└──────┴──────────────┴────────────┴────────────┴───────────┘
第island_width
列给出了连续数据的宽度。 gap_width 给出缺失数据的宽度。