PostgreSQL 9.6 在针对时间戳列进行聚合期间选择了错误的计划
PostgreSQL 9.6 selects a wrong plan during aggregation against timestamp columns
我有一个简单但相当大的 table“日志”,它包含三列:user_id、天、小时。
user_id character varying(36) COLLATE pg_catalog."default" NOT NULL,
day timestamp without time zone,
hours double precision
所有列都有索引。
问题是针对 'day' 字段的聚合工作极其缓慢。例如,简单的查询需要永恒才能完成。
select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
分析表明 Postgres 进行了全面扫描,过滤了与 user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' 无关的条目,这绝对是违反直觉的
[
{
"Execution Time": 146502.05,
"Planning Time": 0.893,
"Plan": {
"Startup Cost": 789.02,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 0.44,
"Actual Rows": 1,
"Plans": [
{
"Index Cond": "(log.day IS NOT NULL)",
"Startup Cost": 0.44,
"Scan Direction": "Forward",
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 1,
"Node Type": "Index Scan",
"Total Cost": 1395792.54,
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 146502.015,
"Output": [
"log.day"
],
"Parent Relationship": "Outer",
"Actual Startup Time": 146502.015,
"Schema": "public",
"Filter": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Actual Loops": 1,
"Rows Removed by Filter": 12665610,
"Index Name": "index_log_day"
}
],
"Node Type": "Limit",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.016,
"Output": [
"log.day"
],
"Parent Relationship": "InitPlan",
"Actual Startup Time": 146502.016,
"Plan Width": 8,
"Subplan Name": "InitPlan 1 (returns [=12=])",
"Actual Loops": 1,
"Total Cost": 789.02
}
],
"Node Type": "Result",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.019,
"Output": [
"[=12=]"
],
"Actual Startup Time": 146502.019,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 789.03
},
"Triggers": []
}
]
更奇怪的是,几乎相似的查询完美运行。
select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
Postgres selects user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' 的条目首先然后在其中聚合明显正确的内容。
[
{
"Execution Time": 5.989,
"Planning Time": 1.186,
"Plan": {
"Partial Mode": "Simple",
"Startup Cost": 6842.66,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 66.28,
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 745,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Actual Rows": 745,
"Node Type": "Bitmap Index Scan",
"Index Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Plan Rows": 1770,
"Parallel Aware": false,
"Actual Total Time": 0.25,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.25,
"Total Cost": 65.84,
"Actual Loops": 1,
"Index Name": "index_log_user_id"
}
],
"Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Exact Heap Blocks": 742,
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 5.793,
"Output": [
"day",
"hours",
"user_id"
],
"Lossy Heap Blocks": 0,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.357,
"Total Cost": 6838.23,
"Actual Loops": 1,
"Schema": "public"
}
],
"Node Type": "Aggregate",
"Strategy": "Plain",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 5.946,
"Output": [
"min(hours)"
],
"Actual Startup Time": 5.946,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 6842.67
},
"Triggers": []
}
]
有两种可能的解决方法:
1) 将查询重写为:
select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id
2) 像 finding MAX(db_timestamp) query
中建议的那样引入对索引
它们可能看起来不错,但我认为这两种方法都是变通方法(第一种甚至是 hack)。从逻辑上讲,如果 Postgres 可以 select 一个“小时”的适当计划,它必须在“一天”内完成,但事实并非如此。所以它看起来像是在聚合时间戳字段期间发生的 Postgres 错误,但我承认我可能会遗漏一些东西。有人可以告诉我是否可以在不使用 WAs 的情况下在这里完成某些事情,或者它真的是一个 Postgres 错误,我必须报告它?
UPD: 我已将此作为一个错误报告给 PostgreSQL 错误邮件列表。如果被接受我会通知大家的。
看到这篇文章有一些玩弄索引的顺序 - PostgreSQL index not used for query on range
还有一个想法是
select min(day) from (
select day from log
where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f'
) q
p.s。另外你能确认 autovacuum (verbose, analyze)
是为 table 执行的吗?
Min 是聚合函数,不是运算符。必须对所有匹配的记录执行函数。
select 部分的字段不影响计划。 From ... join ... where ... group by ... order by - 所有这些都在计划中考虑。
尝试:
select day from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
order by user_id, day
limit 1
我收到了 PostgreSQL 的回复。他们不认为这是一个错误。这种情况下可能有WA,其中很多在原文post和后面的评论中都有提到。我个人的选择是最初提到的第一个选项,因为它不需要索引操作(这并不总是可能的)。所以解决方案是将查询重写为:
select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id
我有一个简单但相当大的 table“日志”,它包含三列:user_id、天、小时。
user_id character varying(36) COLLATE pg_catalog."default" NOT NULL,
day timestamp without time zone,
hours double precision
所有列都有索引。
问题是针对 'day' 字段的聚合工作极其缓慢。例如,简单的查询需要永恒才能完成。
select min(day) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
分析表明 Postgres 进行了全面扫描,过滤了与 user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' 无关的条目,这绝对是违反直觉的
[
{
"Execution Time": 146502.05,
"Planning Time": 0.893,
"Plan": {
"Startup Cost": 789.02,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 0.44,
"Actual Rows": 1,
"Plans": [
{
"Index Cond": "(log.day IS NOT NULL)",
"Startup Cost": 0.44,
"Scan Direction": "Forward",
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 1,
"Node Type": "Index Scan",
"Total Cost": 1395792.54,
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 146502.015,
"Output": [
"log.day"
],
"Parent Relationship": "Outer",
"Actual Startup Time": 146502.015,
"Schema": "public",
"Filter": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Actual Loops": 1,
"Rows Removed by Filter": 12665610,
"Index Name": "index_log_day"
}
],
"Node Type": "Limit",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.016,
"Output": [
"log.day"
],
"Parent Relationship": "InitPlan",
"Actual Startup Time": 146502.016,
"Plan Width": 8,
"Subplan Name": "InitPlan 1 (returns [=12=])",
"Actual Loops": 1,
"Total Cost": 789.02
}
],
"Node Type": "Result",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 146502.019,
"Output": [
"[=12=]"
],
"Actual Startup Time": 146502.019,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 789.03
},
"Triggers": []
}
]
更奇怪的是,几乎相似的查询完美运行。
select min(hours) from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
Postgres selects user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf' 的条目首先然后在其中聚合明显正确的内容。
[
{
"Execution Time": 5.989,
"Planning Time": 1.186,
"Plan": {
"Partial Mode": "Simple",
"Startup Cost": 6842.66,
"Actual Rows": 1,
"Plans": [
{
"Startup Cost": 66.28,
"Plan Width": 8,
"Rows Removed by Index Recheck": 0,
"Actual Rows": 745,
"Plans": [
{
"Startup Cost": 0,
"Plan Width": 0,
"Actual Rows": 745,
"Node Type": "Bitmap Index Scan",
"Index Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Plan Rows": 1770,
"Parallel Aware": false,
"Actual Total Time": 0.25,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.25,
"Total Cost": 65.84,
"Actual Loops": 1,
"Index Name": "index_log_user_id"
}
],
"Recheck Cond": "((log.user_id)::text = 'ab056f5a-390b-41d7-ba56-897c14b679bf'::text)",
"Exact Heap Blocks": 742,
"Node Type": "Bitmap Heap Scan",
"Plan Rows": 1770,
"Relation Name": "log",
"Alias": "log",
"Parallel Aware": false,
"Actual Total Time": 5.793,
"Output": [
"day",
"hours",
"user_id"
],
"Lossy Heap Blocks": 0,
"Parent Relationship": "Outer",
"Actual Startup Time": 0.357,
"Total Cost": 6838.23,
"Actual Loops": 1,
"Schema": "public"
}
],
"Node Type": "Aggregate",
"Strategy": "Plain",
"Plan Rows": 1,
"Parallel Aware": false,
"Actual Total Time": 5.946,
"Output": [
"min(hours)"
],
"Actual Startup Time": 5.946,
"Plan Width": 8,
"Actual Loops": 1,
"Total Cost": 6842.67
},
"Triggers": []
}
]
有两种可能的解决方法:
1) 将查询重写为:
select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id
2) 像 finding MAX(db_timestamp) query
中建议的那样引入对索引它们可能看起来不错,但我认为这两种方法都是变通方法(第一种甚至是 hack)。从逻辑上讲,如果 Postgres 可以 select 一个“小时”的适当计划,它必须在“一天”内完成,但事实并非如此。所以它看起来像是在聚合时间戳字段期间发生的 Postgres 错误,但我承认我可能会遗漏一些东西。有人可以告诉我是否可以在不使用 WAs 的情况下在这里完成某些事情,或者它真的是一个 Postgres 错误,我必须报告它?
UPD: 我已将此作为一个错误报告给 PostgreSQL 错误邮件列表。如果被接受我会通知大家的。
看到这篇文章有一些玩弄索引的顺序 - PostgreSQL index not used for query on range
还有一个想法是
select min(day) from (
select day from log
where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f'
) q
p.s。另外你能确认 autovacuum (verbose, analyze)
是为 table 执行的吗?
Min 是聚合函数,不是运算符。必须对所有匹配的记录执行函数。 select 部分的字段不影响计划。 From ... join ... where ... group by ... order by - 所有这些都在计划中考虑。 尝试:
select day from log where user_id = 'ab056f5a-390b-41d7-ba56-897c14b679bf'
order by user_id, day
limit 1
我收到了 PostgreSQL 的回复。他们不认为这是一个错误。这种情况下可能有WA,其中很多在原文post和后面的评论中都有提到。我个人的选择是最初提到的第一个选项,因为它不需要索引操作(这并不总是可能的)。所以解决方案是将查询重写为:
select user_id, min(day) from log where user_id = 'ac43a155-4fbb-49eb-a670-02c307eb3d4f' group by user_id