一系列日期中每个日期的分组聚合计数
Grouped aggregate counts for each date in a series of dates
我正在尝试使用下表在 系列 日期中按 state
对 task
计数进行分组:
tasks
-----
| id | title | state_id | inserted_at |
| -- | ----------- | -------- | ------------------- |
| 1 | First Task | 1 | 2022-05-05 19:16:44 |
| 2 | Second Task | 1 | 2022-05-07 18:54:40 |
| 3 | Third Task | 1 | 2022-05-07 19:18:28 |
| 4 | Fourth Task | 1 | 2022-05-10 15:28:57 |
task_states
-----
| id | label |
| -- | ---------- |
| 1 | Assigns |
| 2 | In Process |
| 3 | Completed |
task_logs
-----
| id | event | target | value | task_id | inserted_at |
| -- | ------- | ------ | ---------- | ------- | -------------------|
| 1 | changed | state | Assigns | 1 | 2022-05-05 19:16:44|
| 2 | changed | state | In Progress| 1 | 2022-05-06 11:43:14|
| 3 | changed | state | Assigns | 2 | 2022-05-07 18:54:40|
| 4 | changed | state | Assigns | 3 | 2022-05-07 19:18:28|
| 5 | changed | state | Completed | 1 | 2022-05-08 12:11:38|
| 6 | changed | state | In Progress| 2 | 2022-05-09 09:22:53|
| 7 | changed | state | Assigns | 4 | 2022-05-10 15:28:57|
| 8 | changed | state | Completed | 2 | 2022-05-11 11:21:53|
| 9 | changed | state | In Progress| 3 | 2022-05-11 17:42:02|
每个任务没有一致的每日“状态”记录,因为 task_logs
只有任务更改状态时的条目。这意味着我必须在指定日期之前获取每个任务的最后一个“状态更改”日志。我有以下查询来获取一天前每个州的任务计数:
SELECT date('2022-05-10'), state.id as state_id, state.label, count(sub.id)
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= date('2022-05-10') AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
------------------
| date | state_id | label | count |
| ---------- | -------- | ---------- | ----- |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Process | 1 |
| 2022-05-10 | 3 | Completed | 1 |
我的麻烦来自于试图将上面的查询与 generate_series
结合起来以获得 daily 对一系列日期的计数,例如:
| date | state_id | label | count |
| ---------- | -------- | ----------- | ----- |
| 2022-05-05 | 1 | Assigns | 1 |
| 2022-05-05 | 2 | In Progress | 0 |
| 2022-05-05 | 3 | Complete | 0 |
| 2022-05-06 | 1 | Assigns | 0 |
| 2022-05-06 | 2 | In Progress | 1 |
| 2022-05-06 | 3 | Complete | 0 |
| 2022-05-07 | 1 | Assigns | 2 |
| 2022-05-07 | 2 | In Progress | 1 |
| 2022-05-07 | 3 | Complete | 0 |
| 2022-05-08 | 1 | Assigns | 2 |
| 2022-05-08 | 2 | In Progress | 0 |
| 2022-05-08 | 3 | Complete | 1 |
| 2022-05-09 | 1 | Assigns | 1 |
| 2022-05-09 | 2 | In Progress | 1 |
| 2022-05-09 | 3 | Complete | 1 |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Progress | 1 |
| 2022-05-10 | 3 | Complete | 1 |
| 2022-05-11 | 1 | Assigns | 1 |
| 2022-05-11 | 2 | In Progress | 1 |
| 2022-05-11 | 3 | Complete | 2 |
这是使用上表的 dbfiddle 设置。任何关于如何为一系列日期 (generate_series(current_date - interval '5 day', current_date, '1 day')
) 中的每个日期执行上述查询(或重写它)的任何 thoughts/ideas 将不胜感激!
考虑一个存储函数来遍历生成的一系列日期并捕获每个每日聚合快照:
CREATE OR REPLACE FUNCTION build_daily_log_agg(_interval_days TEXT)
RETURNS TABLE ("date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER)
LANGUAGE plpgsql AS
$func$
DECLARE dt RECORD;
BEGIN
CREATE TEMPORARY TABLE daily_log_agg (
"date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER
);
FOR dt IN SELECT dates FROM generate_series(
current_date - _interval_days::interval,
current_date, '1 day'
) AS dates LOOP
INSERT INTO daily_log_agg ("date", state_id, state_label, "count")
SELECT dt.dates AS "date",
state.id AS state_id,
state.label,
COUNT(sub.id) AS "count"
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= dt.dates
AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
END LOOP;
RETURN QUERY
SELECT * FROM daily_log_agg;
END
$func$;
SELECT * FROM build_daily_log_agg('12 days');
我正在尝试使用下表在 系列 日期中按 state
对 task
计数进行分组:
tasks
-----
| id | title | state_id | inserted_at |
| -- | ----------- | -------- | ------------------- |
| 1 | First Task | 1 | 2022-05-05 19:16:44 |
| 2 | Second Task | 1 | 2022-05-07 18:54:40 |
| 3 | Third Task | 1 | 2022-05-07 19:18:28 |
| 4 | Fourth Task | 1 | 2022-05-10 15:28:57 |
task_states
-----
| id | label |
| -- | ---------- |
| 1 | Assigns |
| 2 | In Process |
| 3 | Completed |
task_logs
-----
| id | event | target | value | task_id | inserted_at |
| -- | ------- | ------ | ---------- | ------- | -------------------|
| 1 | changed | state | Assigns | 1 | 2022-05-05 19:16:44|
| 2 | changed | state | In Progress| 1 | 2022-05-06 11:43:14|
| 3 | changed | state | Assigns | 2 | 2022-05-07 18:54:40|
| 4 | changed | state | Assigns | 3 | 2022-05-07 19:18:28|
| 5 | changed | state | Completed | 1 | 2022-05-08 12:11:38|
| 6 | changed | state | In Progress| 2 | 2022-05-09 09:22:53|
| 7 | changed | state | Assigns | 4 | 2022-05-10 15:28:57|
| 8 | changed | state | Completed | 2 | 2022-05-11 11:21:53|
| 9 | changed | state | In Progress| 3 | 2022-05-11 17:42:02|
每个任务没有一致的每日“状态”记录,因为 task_logs
只有任务更改状态时的条目。这意味着我必须在指定日期之前获取每个任务的最后一个“状态更改”日志。我有以下查询来获取一天前每个州的任务计数:
SELECT date('2022-05-10'), state.id as state_id, state.label, count(sub.id)
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= date('2022-05-10') AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
------------------
| date | state_id | label | count |
| ---------- | -------- | ---------- | ----- |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Process | 1 |
| 2022-05-10 | 3 | Completed | 1 |
我的麻烦来自于试图将上面的查询与 generate_series
结合起来以获得 daily 对一系列日期的计数,例如:
| date | state_id | label | count |
| ---------- | -------- | ----------- | ----- |
| 2022-05-05 | 1 | Assigns | 1 |
| 2022-05-05 | 2 | In Progress | 0 |
| 2022-05-05 | 3 | Complete | 0 |
| 2022-05-06 | 1 | Assigns | 0 |
| 2022-05-06 | 2 | In Progress | 1 |
| 2022-05-06 | 3 | Complete | 0 |
| 2022-05-07 | 1 | Assigns | 2 |
| 2022-05-07 | 2 | In Progress | 1 |
| 2022-05-07 | 3 | Complete | 0 |
| 2022-05-08 | 1 | Assigns | 2 |
| 2022-05-08 | 2 | In Progress | 0 |
| 2022-05-08 | 3 | Complete | 1 |
| 2022-05-09 | 1 | Assigns | 1 |
| 2022-05-09 | 2 | In Progress | 1 |
| 2022-05-09 | 3 | Complete | 1 |
| 2022-05-10 | 1 | Assigns | 2 |
| 2022-05-10 | 2 | In Progress | 1 |
| 2022-05-10 | 3 | Complete | 1 |
| 2022-05-11 | 1 | Assigns | 1 |
| 2022-05-11 | 2 | In Progress | 1 |
| 2022-05-11 | 3 | Complete | 2 |
这是使用上表的 dbfiddle 设置。任何关于如何为一系列日期 (generate_series(current_date - interval '5 day', current_date, '1 day')
) 中的每个日期执行上述查询(或重写它)的任何 thoughts/ideas 将不胜感激!
考虑一个存储函数来遍历生成的一系列日期并捕获每个每日聚合快照:
CREATE OR REPLACE FUNCTION build_daily_log_agg(_interval_days TEXT)
RETURNS TABLE ("date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER)
LANGUAGE plpgsql AS
$func$
DECLARE dt RECORD;
BEGIN
CREATE TEMPORARY TABLE daily_log_agg (
"date" TEXT,
state_id INTEGER,
state_label TEXT,
"count" INTEGER
);
FOR dt IN SELECT dates FROM generate_series(
current_date - _interval_days::interval,
current_date, '1 day'
) AS dates LOOP
INSERT INTO daily_log_agg ("date", state_id, state_label, "count")
SELECT dt.dates AS "date",
state.id AS state_id,
state.label,
COUNT(sub.id) AS "count"
FROM (
SELECT DISTINCT ON (t.id) t.id, logs.value
FROM tasks t
INNER JOIN task_logs logs ON logs.task_id = t.id
WHERE date(logs.inserted_at) <= dt.dates
AND logs.target = 'state'
ORDER BY t.id, logs.inserted_at DESC
) sub
RIGHT JOIN task_states state ON state.label = sub.value
GROUP BY state.id
ORDER BY state.id;
END LOOP;
RETURN QUERY
SELECT * FROM daily_log_agg;
END
$func$;
SELECT * FROM build_daily_log_agg('12 days');