Psql - 生成总计 运行 的序列

Psql - generate series with running total

我有以下 table:

create table account_info(
    id int not null unique,
    creation_date date,
    deletion_date date,
    gather boolean)

正在向其中添加示例数据:

insert into account_info(id,creation_date,deletion_date,gather)
values(1,'2019-09-10',null,true),
(2,'2019-09-12',null,true),
(3,'2019-09-14','2019-10-08',true),
(4,'2019-09-15','2019-09-18',true),
(5,'2019-09-22',null,false),
(6,'2019-09-27','2019-09-29',true),
(7,'2019-10-04','2019-10-17',false),
(8,null,'2019-10-20',true),
(9,'2019-10-12',null,true),
(10,'2019-10-18',null,true)

我想查看按周分组添加了多少帐户以及按周分组删除了多少帐户。

我试过以下方法:

select dd, count(distinct ai.id) as created ,count(distinct ai2.id) as deleted

from generate_series('2019-09-01'::timestamp, 
                     '2019-10-21'::timestamp, '1 week'::interval) dd
left join account_info ai on ai.creation_date::DATE <= dd::DATE
left join account_info ai2 on ai2.deletion_date::DATE <=dd::DATE
where ai.gather is true
and ai2.gather is true
group by dd
order by dd asc

这会产生以下输出:

 dd          | Created | Deleted |
+------------+---------+---------+
| 2019-09-22 |       4 |       1 |
| 2019-09-29 |       5 |       2 |
| 2019-10-06 |       5 |       2 |
| 2019-10-13 |       6 |       3 |
| 2019-10-20 |       7 |       4 |

此输出显示 运行 已创建的总数和已删除的总数。

不过我希望看到这样的内容:

+------------+---------+---------+-------------------+-------------------+
|     dd     | Created | Deleted | Total Sum Created | Total Sum Deleted |
+------------+---------+---------+-------------------+-------------------+
| 2019-09-22 | 4       | 1       |                 4 |                 1 |
| 2019-09-29 | 1       | 1       |                 5 |                 2 |
| 2019-10-06 | NULL    | NULL    |                 5 |                 2 |
| 2019-10-13 | 1       | 1       |                 6 |                 3 |
| 2019-10-20 | 1       | 1       |                 7 |                 4 |

我在尝试对 psql 中的 createddeleted 列求和时收到一条错误消息。因为我不能嵌套聚合函数。

您可以将现有查询转换为子查询并使用 lag() 计算连续记录之间的差异:

select 
    dd,
    created - coalesce(lag(created) over(order by dd), 0) created,
    deleted - coalesce(lag(deleted) over(order by dd), 0) deleted,
    created total_sum_created,
    deleted total_sum_deleted
from (
    select 
        dd, 
        count(distinct ai.id) as created ,
        count(distinct ai2.id) as deleted
    from 
        generate_series(
            '2019-09-01'::timestamp, 
            '2019-10-21'::timestamp, 
            '1 week'::interval
        ) dd
        left join account_info ai 
            on ai.creation_date::DATE <= dd::DATE and ai.gather is true
        left join account_info ai2 
            on ai2.deletion_date::DATE <=dd::DATE and ai2.gather is true
    group by dd
) x
order by dd asc

我将条件 ai[2].gather = true 移到了 joinon 一侧:将这些条件放在 where 子句中基本上会让你 left join inner joins.

Demo on DB Fiddle:

| dd                       | created | deleted | total_sum_created | total_sum_deleted |
| ------------------------ | ------- | ------- | ----------------- | ----------------- |
| 2019-09-01T00:00:00.000Z | 0       | 0       | 0                 | 0                 |
| 2019-09-08T00:00:00.000Z | 0       | 0       | 0                 | 0                 |
| 2019-09-15T00:00:00.000Z | 4       | 0       | 4                 | 0                 |
| 2019-09-22T00:00:00.000Z | 0       | 1       | 4                 | 1                 |
| 2019-09-29T00:00:00.000Z | 1       | 1       | 5                 | 2                 |
| 2019-10-06T00:00:00.000Z | 0       | 0       | 5                 | 2                 |
| 2019-10-13T00:00:00.000Z | 1       | 1       | 6                 | 3                 |
| 2019-10-20T00:00:00.000Z | 1       | 1       | 7                 | 4                 |

另一种选择是将 lag()generate_series() 结合使用以生成日期范围列表。然后你可以在原来的 table 上只做一个连接,并在外部查询中做条件聚合:

select
    dd,
    count(distinct case 
        when ai.creation_date::date <= dd::date and ai.creation_date::date > lag_dd::date 
        then ai.id 
    end) created,
    count(distinct case 
        when ai.deletion_date::date <= dd::date and ai.deletion_date::date > lag_dd::date 
        then ai.id 
    end) deleted,
    count(distinct case 
        when ai.creation_date::date <= dd::date 
        then ai.id 
    end) total_sum_created,
    count(distinct case 
        when ai.deletion_date::date <= dd::date 
        then ai.id 
    end) total_sum_deleted
from 
    (
        select dd, lag(dd) over(order by dd) lag_dd
        from generate_series(
            '2019-09-01'::timestamp, 
            '2019-10-21'::timestamp, 
            '1 week'::interval
        ) dd
    ) dd
    left join account_info ai on ai.gather is true
group by dd
order by dd

Demo on DB Fiddle

您可以使用一系列 CTE 构建数据表来生成您想要的结果:

with dd as
(select *
 from generate_series('2019-09-01'::timestamp, 
                      '2019-10-21'::timestamp, '1 week'::interval) d),
ddl as
(select d, coalesce(lag(d) over (order by d), '1970-01-01'::timestamp) as pd
 from dd),
counts as
(select d, count(distinct ai.id) as created, count(distinct ai2.id) as deleted
 from ddl
 left join account_info ai on ai.creation_date::DATE > ddl.pd::DATE AND ai.creation_date::DATE <= ddl.d::DATE AND ai.gather is true
 left join account_info ai2 on ai2.deletion_date::DATE > ddl.pd::DATE AND ai2.deletion_date::DATE <= ddl.d::DATE AND ai2.gather is true
 group by d)
select d, created, deleted,
       sum(created) over (rows unbounded preceding) as "total created",
       sum(deleted) over (rows unbounded preceding) as "total deleted"
from counts
order by d asc

请注意,gather 条件必须是 left join 的一部分,以避免将其变成内部联接。

输出:

d                       created     deleted     total created   total deleted
2019-09-01 00:00:00     0           0           0               0
2019-09-08 00:00:00     0           0           0               0
2019-09-15 00:00:00     4           0           4               0
2019-09-22 00:00:00     0           1           4               1
2019-09-29 00:00:00     1           1           5               2
2019-10-06 00:00:00     0           0           5               2
2019-10-13 00:00:00     1           1           6               3
2019-10-20 00:00:00     1           1           7               4

请注意,此查询给出了以 d 结尾的一周的结果。如果您想要从 d 开始的一周的结果,可以将 lag 更改为 lead。你可以在我的演示中看到这个。

Demo on dbfiddle

横向连接和聚合非常适合这个问题。如果您对数据中的周数感到满意:

select date_trunc('week', dte) as week,
       sum(is_create) as creates_in_week,
       sum(is_delete) as deletes_in_week,
       sum(sum(is_create)) over (order by min(v.dte)) as running_creates,
       sum(sum(is_delete)) over (order by min(v.dte)) as running_deletes
from account_info ai cross join lateral
     (values (ai.creation_date, 1, 0), (ai.deletion_date, 0, 1)
     ) v(dte, is_create, is_delete)
where v.dte is not null and ai.gather
group by week
order by week;

如果您想要在指定的几周内使用它:

select gs.wk,
       sum(v.is_create) as creates_in_week,
       sum(v.is_delete) as deletes_in_week,
       sum(sum(v.is_create)) over (order by min(v.dte)) as running_creates,
       sum(sum(v.is_delete)) over (order by min(v.dte)) as running_deletes
from generate_series('2019-09-01'::timestamp, 
                     '2019-10-21'::timestamp, '1 week'::interval) gs(wk) left join
    ( account_info ai cross join lateral
      (values (ai.creation_date, 1, 0), (ai.deletion_date, 0, 1)
      ) v(dte, is_create, is_delete)
    )
    on v.dte >= gs.wk and
       v.dte < gs.wk + interval '1 week'
where dte is not null and ai.gather
group by gs.wk
order by gs.wk;

Here 是一个 db<>fiddle.