在带有/不带时区的日期或时间戳的查询中处理 generate_series()
Handling of generate_series() in queries with date or timestamp with / without time zone
我有一个查询要根据按 date
和 employee_id
分组的日期系列生成报告。日期应基于特定时区,在本例中为 'Asia/Kuala_Lumpur'。但这可能会根据用户所在的时区而改变。
SELECT
d::date AT TIME ZONE 'Asia/Kuala_Lumpur' AS created_date,
e.id,
e.name,
e.division_id,
ARRAY_AGG(
a.id
) as rows,
MIN(a.created_at) FILTER (WHERE a.activity_type = 1) as min_time_in,
MAX(a.created_at) FILTER (WHERE a.activity_type = 2) as max_time_out,
ARRAY_AGG(
CASE
WHEN a.activity_type = 1
THEN a.created_at
ELSE NULL
END
) as check_ins,
ARRAY_AGG(
CASE
WHEN a.activity_type = 2
THEN a.created_at
ELSE NULL
END
) as check_outs
FROM (SELECT MIN(created_at), MAX(created_at) FROM attendance) AS r(startdate,enddate)
, generate_series(
startdate::timestamp,
enddate::timestamp,
interval '1 day') g(d)
CROSS JOIN employee e
LEFT JOIN attendance a ON a.created_at::date = d::date AND e.id = a.employee_id
where d::date = date '2020-11-20' and division_id = 1
GROUP BY
created_date
, e.id
, e.name
, e.division_id
ORDER BY
created_date
, e.id;
tableattendance
的定义和样本数据:
CREATE TABLE attendance (
id int,
employee_id int,
activity_type int,
created_at timestamp with time zone NOT NULL
);
INSERT INTO attendance VALUES
( 1, 1, 1,'2020-11-18 07:10:25 +00:00'),
( 2, 2, 1,'2020-11-18 07:30:25 +00:00'),
( 3, 3, 1,'2020-11-18 07:50:25 +00:00'),
( 4, 2, 2,'2020-11-18 19:10:25 +00:00'),
( 5, 3, 2,'2020-11-18 19:22:38 +00:00'),
( 6, 1, 2,'2020-11-18 20:01:05 +00:00'),
( 7, 1, 1,'2020-11-19 07:11:23 +00:00'),
( 8, 1, 2,'2020-11-19 16:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
( 9, 1, 1,'2020-11-19 19:11:23 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_ins field in the results output)
(10, 1, 2,'2020-11-19 20:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
(11, 1, 1,'2020-11-20 07:41:38 +00:00'),
(12, 1, 2,'2020-11-20 08:52:01 +00:00');
这里有一个fiddle来测试。
查询不包括时区 Asia/Kuala_Lumpur +8 的输出中的第 8-10 行,尽管它应该。结果显示“行”字段 11,12
.
如何修复查询,使其根据给定时区的日期生成报告? (意思是我可以将 Asia/Kuala_Lumpur
更改为 America/New_York
等)
我被告知要做这样的事情:
where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and created_at < timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'
但我不知道如何应用它。在 this fiddle 中似乎无法正常工作。它应该包括第 8、9、10、11、12 行,但只显示第 8、9、10 行。
数据库设计
考虑对您的设置进行一些修改:
CREATE TABLE employee (
id int PRIMARY KEY -- !
, name text -- do NOT use char(n) !
, division_id int
);
CREATE TABLE attendance (
id int PRIMARY KEY --!
, employee_id int NOT NULL REFERENCES employee -- FK!
, activity_type int
, created_at timestamptz NOT NULL
);
定义 PK 可以更容易地聚合行,因为 PK 涵盖 GROUP BY
子句中的整行。参见:
我不会使用“名称”作为列名。这不是描述性的。每隔一列可以命名为“名称”。考虑:
- Any downsides of using data type "text" for storing strings?
- How to implement a many-to-many relationship in PostgreSQL?
查询
SELECT *
FROM ( -- complete employee/date grid for division in range
SELECT g.d::date AS the_date, id AS employee_id, name, division_id
FROM (
SELECT generate_series(MIN(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, MAX(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, interval '1 day')
FROM attendance
) g(d)
CROSS JOIN employee e
WHERE e.division_id = 1
) de
LEFT JOIN ( -- checkins & checkouts per employee/date for division in range
SELECT employee_id, ts::date AS the_date
, array_agg(id) as rows
, min(ts) FILTER (WHERE activity_type = 1) AS min_check_in
, max(ts) FILTER (WHERE activity_type = 2) AS max_check_out
, array_agg(ts::time) FILTER (WHERE activity_type = 1) AS check_ins
, array_agg(ts::time) FILTER (WHERE activity_type = 2) AS check_outs
FROM (
SELECT a.id, a.employee_id, a.activity_type, a.created_at AT TIME ZONE 'Asia/Kuala_Lumpur' AS ts -- convert to timestamp
FROM employee e
JOIN attendance a ON a.employee_id = e.id
-- WHERE a.created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' -- "sargable" expressions
-- AND a.created_at < timestamp '2020-11-21' AT TIME ZONE 'Asia/Kuala_Lumpur' -- exclusive upper bound (includes all of 2020-11-20);
AND e.division_id = 1
ORDER BY a.employee_id, a.created_at, a.activity_type -- optional to guarantee sorted arrays
) sub
GROUP BY 1, 2
) a USING (the_date, employee_id)
ORDER BY 1, 2;
db<>fiddle here
请注意,我的查询输出 Asia/Kuala_Lumpur:
的本地日期和时间
test=> SELECT timestamptz '2020-11-20 08:52:01 +0' AT TIME ZONE 'Asia/Kuala_Lumpur' AS local_ts;
local_ts
---------------------
2020-11-20 16:52:01
从哪里开始?您需要了解时区和 Postgres 数据类型的概念timestamp with time zone
(timestamptz
) 与 timestamp without time zone
(timestamp
) .否则,将是无止境的混乱。从这里开始:
- Ignoring time zones altogether in Rails and PostgreSQL
最值得注意的是,timestamptz
不 存储时区:
当简单地将 timestamptz
转换为 date
或 timestamp
时,假定会话的当前时区设置。 不是你想要的。使用 AT TIME ZONE
结构明确提供时区以避免此错误。在您的 fiddle 中,您同时拥有:
...
, generate_series(
startdate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur',
enddate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur',
interval '1 day') g(d)
...
也不做你想做的事。在(错误!)转换为 timestamp
之后,AT TIME ZONE
构造将值转换回 timestamptz
。
此外,您的查询生成了所有用户的完整笛卡尔积以及 table attendance
中天数的最大范围,仅通过以下方式将其缩减为一天:
where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and created_at < timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'
WHERE
子句终于完成了它应该做的事情。但是首先生成完整的天数范围,然后丢弃大部分天数是没有意义的。 (您似乎同时从 my other fiddle 复制了它?)
我注释掉了 WHERE
子句,并在我的查询中保留了您的 generate_series()
的优化版本作为概念证明。延伸阅读:
- Generating time series between two dates in PostgreSQL
我有一个查询要根据按 date
和 employee_id
分组的日期系列生成报告。日期应基于特定时区,在本例中为 'Asia/Kuala_Lumpur'。但这可能会根据用户所在的时区而改变。
SELECT
d::date AT TIME ZONE 'Asia/Kuala_Lumpur' AS created_date,
e.id,
e.name,
e.division_id,
ARRAY_AGG(
a.id
) as rows,
MIN(a.created_at) FILTER (WHERE a.activity_type = 1) as min_time_in,
MAX(a.created_at) FILTER (WHERE a.activity_type = 2) as max_time_out,
ARRAY_AGG(
CASE
WHEN a.activity_type = 1
THEN a.created_at
ELSE NULL
END
) as check_ins,
ARRAY_AGG(
CASE
WHEN a.activity_type = 2
THEN a.created_at
ELSE NULL
END
) as check_outs
FROM (SELECT MIN(created_at), MAX(created_at) FROM attendance) AS r(startdate,enddate)
, generate_series(
startdate::timestamp,
enddate::timestamp,
interval '1 day') g(d)
CROSS JOIN employee e
LEFT JOIN attendance a ON a.created_at::date = d::date AND e.id = a.employee_id
where d::date = date '2020-11-20' and division_id = 1
GROUP BY
created_date
, e.id
, e.name
, e.division_id
ORDER BY
created_date
, e.id;
tableattendance
的定义和样本数据:
CREATE TABLE attendance (
id int,
employee_id int,
activity_type int,
created_at timestamp with time zone NOT NULL
);
INSERT INTO attendance VALUES
( 1, 1, 1,'2020-11-18 07:10:25 +00:00'),
( 2, 2, 1,'2020-11-18 07:30:25 +00:00'),
( 3, 3, 1,'2020-11-18 07:50:25 +00:00'),
( 4, 2, 2,'2020-11-18 19:10:25 +00:00'),
( 5, 3, 2,'2020-11-18 19:22:38 +00:00'),
( 6, 1, 2,'2020-11-18 20:01:05 +00:00'),
( 7, 1, 1,'2020-11-19 07:11:23 +00:00'),
( 8, 1, 2,'2020-11-19 16:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
( 9, 1, 1,'2020-11-19 19:11:23 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_ins field in the results output)
(10, 1, 2,'2020-11-19 20:21:53 +00:00'), <-- Asia/Kuala_Lumpur +8 should be in 20.11 (refer to the check_outs field in the results output)
(11, 1, 1,'2020-11-20 07:41:38 +00:00'),
(12, 1, 2,'2020-11-20 08:52:01 +00:00');
这里有一个fiddle来测试。
查询不包括时区 Asia/Kuala_Lumpur +8 的输出中的第 8-10 行,尽管它应该。结果显示“行”字段 11,12
.
如何修复查询,使其根据给定时区的日期生成报告? (意思是我可以将 Asia/Kuala_Lumpur
更改为 America/New_York
等)
我被告知要做这样的事情:
where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and created_at < timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'
但我不知道如何应用它。在 this fiddle 中似乎无法正常工作。它应该包括第 8、9、10、11、12 行,但只显示第 8、9、10 行。
数据库设计
考虑对您的设置进行一些修改:
CREATE TABLE employee (
id int PRIMARY KEY -- !
, name text -- do NOT use char(n) !
, division_id int
);
CREATE TABLE attendance (
id int PRIMARY KEY --!
, employee_id int NOT NULL REFERENCES employee -- FK!
, activity_type int
, created_at timestamptz NOT NULL
);
定义 PK 可以更容易地聚合行,因为 PK 涵盖 GROUP BY
子句中的整行。参见:
我不会使用“名称”作为列名。这不是描述性的。每隔一列可以命名为“名称”。考虑:
- Any downsides of using data type "text" for storing strings?
- How to implement a many-to-many relationship in PostgreSQL?
查询
SELECT *
FROM ( -- complete employee/date grid for division in range
SELECT g.d::date AS the_date, id AS employee_id, name, division_id
FROM (
SELECT generate_series(MIN(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, MAX(created_at) AT TIME ZONE 'Asia/Kuala_Lumpur'
, interval '1 day')
FROM attendance
) g(d)
CROSS JOIN employee e
WHERE e.division_id = 1
) de
LEFT JOIN ( -- checkins & checkouts per employee/date for division in range
SELECT employee_id, ts::date AS the_date
, array_agg(id) as rows
, min(ts) FILTER (WHERE activity_type = 1) AS min_check_in
, max(ts) FILTER (WHERE activity_type = 2) AS max_check_out
, array_agg(ts::time) FILTER (WHERE activity_type = 1) AS check_ins
, array_agg(ts::time) FILTER (WHERE activity_type = 2) AS check_outs
FROM (
SELECT a.id, a.employee_id, a.activity_type, a.created_at AT TIME ZONE 'Asia/Kuala_Lumpur' AS ts -- convert to timestamp
FROM employee e
JOIN attendance a ON a.employee_id = e.id
-- WHERE a.created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' -- "sargable" expressions
-- AND a.created_at < timestamp '2020-11-21' AT TIME ZONE 'Asia/Kuala_Lumpur' -- exclusive upper bound (includes all of 2020-11-20);
AND e.division_id = 1
ORDER BY a.employee_id, a.created_at, a.activity_type -- optional to guarantee sorted arrays
) sub
GROUP BY 1, 2
) a USING (the_date, employee_id)
ORDER BY 1, 2;
db<>fiddle here
请注意,我的查询输出 Asia/Kuala_Lumpur:
的本地日期和时间test=> SELECT timestamptz '2020-11-20 08:52:01 +0' AT TIME ZONE 'Asia/Kuala_Lumpur' AS local_ts;
local_ts
---------------------
2020-11-20 16:52:01
从哪里开始?您需要了解时区和 Postgres 数据类型的概念timestamp with time zone
(timestamptz
) 与 timestamp without time zone
(timestamp
) .否则,将是无止境的混乱。从这里开始:
- Ignoring time zones altogether in Rails and PostgreSQL
最值得注意的是,timestamptz
不 存储时区:
当简单地将 timestamptz
转换为 date
或 timestamp
时,假定会话的当前时区设置。 不是你想要的。使用 AT TIME ZONE
结构明确提供时区以避免此错误。在您的 fiddle 中,您同时拥有:
...
, generate_series(
startdate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur',
enddate::timestamp AT TIME ZONE 'Asia/Kuala_Lumpur',
interval '1 day') g(d)
...
也不做你想做的事。在(错误!)转换为 timestamp
之后,AT TIME ZONE
构造将值转换回 timestamptz
。
此外,您的查询生成了所有用户的完整笛卡尔积以及 table attendance
中天数的最大范围,仅通过以下方式将其缩减为一天:
where created_at >= timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur'
and created_at < timestamp '2020-11-20' AT TIME ZONE 'Asia/Kuala_Lumpur' + interval '1 day'
WHERE
子句终于完成了它应该做的事情。但是首先生成完整的天数范围,然后丢弃大部分天数是没有意义的。 (您似乎同时从 my other fiddle 复制了它?)
我注释掉了 WHERE
子句,并在我的查询中保留了您的 generate_series()
的优化版本作为概念证明。延伸阅读:
- Generating time series between two dates in PostgreSQL