PostgreSQL 选择散列连接而不是索引扫描

PostgreSQL choosing a hash join rather than an index scan

我是 运行 PostgreSQL 12.6 版。 我有一个包含 14007206 行的 table delivery_info(为简洁起见,我删除了不相关的 DDL 部分):

create table if not exists delivery_info
(
    id bigserial not null
        constraint delivery_info_pkey
            primary key,
    user_notification_id bigint
        constraint delivery_info__user_notification_id__fkey
            references notification_user
                on delete cascade,
    ...
    acknowledged boolean default false not null,
    status_change_date timestamp not null,
    ...
);

...

create index if not exists delivery_info__user_notification_id__index
    on delivery_info (user_notification_id);

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status asc, status_change_date desc, acknowledged asc)
    where (status = 1);

并且 notification_user 有 35503013 行:

create table if not exists notification_user
(
    id bigserial not null
        constraint notification_user__id__seq
            primary key,
    ...
);

有问题的查询在两个 table 上使用 delivery_info 上的 WHERE 子句进行联接:

SELECT *
FROM delivery_info AS d
INNER JOIN notification_user AS n ON d.user_notification_id = n.id
WHERE d.status = 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z';

通常引擎会在 delivery_info:

上选择散列连接和索引扫描
Gather  (cost=1211782.75..1987611.05 rows=2293631 width=122) (actual time=49921.908..123141.788 rows=2790 loops=1)
  Workers Planned: 4
  Workers Launched: 4
  Buffers: shared hit=24996 read=412218
  I/O Timings: read=317223.835
  ->  Parallel Hash Join  (cost=211782.75..758247.95 rows=573408 width=122) (actual time=49923.633..123072.227 rows=558 loops=5)
        Hash Cond: (n.id = d.user_notification_id)
        Buffers: shared hit=24993 read=412218
        I/O Timings: read=317223.835
        ->  Parallel Seq Scan on notification_user n  (cost=0.00..511671.22 rows=8896122 width=75) (actual time=9.874..90448.053 rows=7100603 loops=5)
              Buffers: shared hit=10492 read=412218
              I/O Timings: read=317223.835
        ->  Parallel Hash  (cost=204615.15..204615.15 rows=573408 width=47) (actual time=210.255..210.262 rows=558 loops=5)
              Buckets: 4194304  Batches: 1  Memory Usage: 33056kB
              Buffers: shared hit=14386
              ->  Parallel Bitmap Heap Scan on delivery_info d  (cost=43803.04..204615.15 rows=573408 width=47) (actual time=187.358..188.670 rows=558 loops=5)
                    Recheck Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status = 1))
                    Filter: (NOT acknowledged)
                    Heap Blocks: exact=87
                    Buffers: shared hit=14386
                    ->  Bitmap Index Scan on delivery_info__status_change_date_acknowledged__index  (cost=0.00..43229.63 rows=2293631 width=0) (actual time=182.445..182.447 rows=2790 loops=1)
                          Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
                          Buffers: shared hit=14259
Planning Time: 57.240 ms
Execution Time: 123147.866 ms

使用 SET enable_seqscan = off 分析的相同查询:

Gather  (cost=1043803.60..2525242.24 rows=2293631 width=122) (actual time=156.124..186.178 rows=2790 loops=1)
  Workers Planned: 4
  Workers Launched: 4
  Buffers: shared hit=28349
  ->  Nested Loop  (cost=43803.60..1295879.14 rows=573408 width=122) (actual time=124.191..137.654 rows=558 loops=5)
        Buffers: shared hit=28349
        ->  Parallel Bitmap Heap Scan on delivery_info d  (cost=43803.04..204615.15 rows=573408 width=47) (actual time=124.141..125.410 rows=558 loops=5)
              Recheck Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status = 1))
              Filter: (NOT acknowledged)
              Heap Blocks: exact=57
              Buffers: shared hit=14386
              ->  Bitmap Index Scan on delivery_info__status_change_date_acknowledged__index  (cost=0.00..43229.63 rows=2293631 width=0) (actual time=155.243..155.245 rows=2790 loops=1)
                    Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
                    Buffers: shared hit=14259
        ->  Index Scan using notification_user__id__seq on notification_user n  (cost=0.56..1.90 rows=1 width=75) (actual time=0.007..0.007 rows=1 loops=2790)
              Index Cond: (id = d.user_notification_id)
              Buffers: shared hit=13963
Planning Time: 1.061 ms
Execution Time: 190.706 ms

我还注意到,如果我在 delivery_info.status_change_date ((d.status_change_date < '2021-04-16T13:48:00.2234239Z') AND (d.status_change_date > '2021-04-10T13:48:00.2234239Z');) 上设置下限,查询计划器显然会决定 delivery_info 上的查询变得足够有选择性,以至于使用notification_user 的索引是合适的:

Nested Loop  (cost=0.99..468852.99 rows=155590 width=122) (actual time=0.074..243.885 rows=2790 loops=1)
  Buffers: shared hit=28412
  ->  Index Scan using delivery_info__status_change_date_acknowledged__index on delivery_info d  (cost=0.43..132601.82 rows=155590 width=47) (actual time=0.037..203.074 rows=2790 loops=1)
        Index Cond: ((status_change_date < '2021-04-16 13:48:00.223424'::timestamp without time zone) AND (status_change_date > '2021-04-10 13:48:00.223424'::timestamp without time zone) AND (acknowledged = false))
        Buffers: shared hit=14452
  ->  Index Scan using notification_user__id__seq on notification_user n  (cost=0.56..2.16 rows=1 width=75) (actual time=0.007..0.007 rows=1 loops=2790)
        Index Cond: (id = d.user_notification_id)
        Buffers: shared hit=13960
Planning Time: 16.755 ms
Execution Time: 248.475 ms

两个查询(有和没有 delivery_info.status_change_date 的下限)return 2790 个结果。 很明显,问题在于查询规划器假定 status_change_date 上的子句非常没有选择性,尽管满足查询中所有子句的行相对较少。我该如何优化这种行为?我不想在 status_change_date.

上设置下限

我在 delivery_info 上做了 VACUUM ANALYZE,我还检查了 seq_page_costrandom_page_cost(都设置为 1)。尝试在 status_change_date 上增加 STATISTICS 并在 运行 ANALYZE 之前增加 default_statistics_target,但都无济于事。

编辑: 根据@jjanes 的建议,我添加了 where 子句表达式的不同组合的实际和估计计数:

clause                                                                                              actual      estimated
d.status = 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'   2790        2295101
d.status = 1 AND d.acknowledged = false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'   119         571
d.status = 1 AND d.acknowledged != false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'  2891204     596341
d.status = 1 AND d.acknowledged != false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'  0           148
d.status != 1 AND d.acknowledged = false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z'  11113008    8820447
d.status != 1 AND d.acknowledged = false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z'  3           2193
d.status != 1 AND d.acknowledged != false AND d.status_change_date < '2021-04-16T13:48:00.2234239Z' 82          2291834
d.status != 1 AND d.acknowledged != false AND d.status_change_date > '2021-04-16T13:48:00.2234239Z' 0           570

似乎估计的计数非常不准确。我已经完成了 ANALYZE,我错过了什么?

首先引起我注意的是这个索引:

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status asc, status_change_date desc, acknowledged asc)
    where (status = 1);

当所有值都具有相同的值时,添加“status asc”没有意义:“where (status = 1)”。

我会合并两个索引,然后先尝试这个:

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status_change_date desc, user_notification_id, acknowledged)
    where (status = 1);

另一件可能有帮助的事情是创建一些额外的 statistics

在您的索引中添加一列并对其中的列重新排序应该会有所帮助。

您的查询的 WHERE 子句在您的 delivery_info table.

上执行这些过滤器
WHERE d.status = 1
  AND d.acknowledged = false
  AND d.status_change_date < timeconstant;

然后它使用 d.user_notification_id 作为 fk 访问您的另一个 table。

有助于此查询的一个好办法是创建覆盖 BTREE 索引,如下所示。

create index if not exists delivery_info__status_change_date_acknowledged__index
    on delivery_info (status, 
                      acknowledged, 
                      status_change_date desc, 
                      user_notification_id) 
 where (status = 1);

为什么?查询可以随机访问索引到第一个符合条件的条目,然后通过扫描索引从 table 中完全满足它的需要。作为额外的奖励,您用于 fk 的值将按升序排列,与您加入的 table 上的 pk 相匹配。希望这应该允许合并联接代替散列联接。

你的 acknowledged 列应该在索引中你的 status_change_date 列之前,因为你过滤前者的相等性和后者的范围。

专业提示: SELECT * 在这些情况下可能对性能有害,因为它会强制查询检索您可能不需要的列。在 SELECT 子句中列出您需要的列。

您使用一个

SELECT *

在您的查询中,并为我们提供了 table 结构的一部分。

如果不是查询中使用的所有列都在索引定义中,则任何索引都不会有效。

所以问题是:您真的需要返回所有列吗?如果是,您的索引必须包含 table 中的所有列,对于这种情况,您必须使用新的 INCLUDE 子句(为 Microsoft SQL 服务器发明),否则,rresticts 中的列列表SELECT 语句的 SELECT 子句到您需要的列的最小子集。

顺便说一句,请始终提供完整的 DDL 代码

我的意思是您应该在忽略而不是否定子句的不同组合的情况下获得计数。但我们仍然可以通过更多的数学运算得出相同的结论。我们看到这些列彼此强烈依赖。状态为 1 的事物的确认率 (99.9%) 远高于整体事物 (20%)。虽然规划器通常假设列是独立的。

这就是创建自定义统计信息来处理的内容。在这种情况下,您需要:

CREATE STATISTICS foobar (MCV) ON acknowledged, status FROM delivery_info;
ANALYZE delivery_info;

当我使用您显示的分布创建数据时,这就像魔术一样修复了估计值。但这取决于相对较少的不同状态值。 (我使用 2 到 11 均匀分布作为 !=1 时的值)。你说你尝试创建统计数据,但你没有说你到底尝试了什么,也没有说你是否在创建它们后分析了 table。

自定义统计信息没有多维直方图。在这种情况下,您所依赖的是最常见的价值 (MCV)。但是当你包含时间戳时,它本质上是一个连续变量(好吧,我假设......也许你把它截断为月份或其他东西,但我怀疑)并尝试连续计算 MCV变量是无望的。因此,包括时间戳会使自定义统计信息变得无用。如果“status”有大量不同的值,这基本上是同样的原因它也不起作用。