PostgreSQL fdw table 性能在恰好 5 个相同的查询后呈指数下降

PostgreSQL fdw table performance drops exponentially after exactly 5 identical queries

我在 CentOS 机器上有一个 postgres 13.3 数据库 运行ning。在执行这些测试时,这台机器上没有其他任何东西 运行,并且在执行测试时没有其他东西访问数据库。

table jammerdal 包含大约 500.000 行。我用包含大约 50.000 行的其他 table 重复了这个实验。结果是一样的,但减速似乎与使用的 fdw table 中的行数以及生成的假 ID 的数量相关。

运行这个:

CREATE OR REPLACE FUNCTION jegfatterintet() RETURNS TABLE (c BIGINT)
AS $$
DECLARE
    jammerdal_ids VARCHAR[];
    area_row RECORD;
    start TIMESTAMP;
BEGIN
    SELECT INTO jammerdal_ids ARRAY_AGG('id-'||x::VARCHAR) FROM generate_series(0,25) x;

    FOR area_row IN SELECT * FROM generate_series(1,10)
        LOOP
        SELECT INTO start clock_timestamp();
        --RAISE NOTICE '%: Start %.', clock_timestamp(), area_row;
        RETURN QUERY
              SELECT COUNT(*) FROM jammerdal WHERE id = ANY(jammerdal_ids);
        --RAISE NOTICE '%: End %.', clock_timestamp(), area_row;
        RAISE NOTICE '%: Duration is %.', area_row, clock_timestamp()-start;
    END LOOP;
    RAISE NOTICE '%: All done.', clock_timestamp();
END
$$ LANGUAGE plpgsql;    

SELECT * FROM jegfatterintet();

产生这个输出:

CREATE FUNCTION
NOTICE:  (1): Duration is 00:00:00.019555.
NOTICE:  (2): Duration is 00:00:00.001271.
NOTICE:  (3): Duration is 00:00:00.001089.
NOTICE:  (4): Duration is 00:00:00.00118.
NOTICE:  (5): Duration is 00:00:00.001035.
NOTICE:  (6): Duration is 00:00:02.954527.
NOTICE:  (7): Duration is 00:00:02.871185.
NOTICE:  (8): Duration is 00:00:02.812426.
NOTICE:  (9): Duration is 00:00:02.777037.
NOTICE:  (10): Duration is 00:00:02.90708.
NOTICE:  2021-09-07 11:21:53.577115+00: All done.
 c 
---
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
(10 rows)

请注意持续时间如何从第 6 步向前突然从 0.01 秒以下增加到近 3 秒。

只有当 jammerdal 是外国 (fdw) table 时才会发生这种情况,而当它是本地时不会发生。而且只有在使用 id 数组时才会发生。

如果我将函数更改为:

CREATE OR REPLACE FUNCTION jegfatterintet() RETURNS TABLE (c BIGINT)
AS $$
DECLARE
    jammerdal_ids VARCHAR[];
    area_row RECORD;
    start TIMESTAMP;
BEGIN
    SELECT INTO jammerdal_ids ARRAY_AGG('id-'||x::VARCHAR) FROM generate_series(0,25) x;

    FOR area_row IN SELECT * FROM generate_series(1,10)
        LOOP
        SELECT INTO start clock_timestamp();
        --RAISE NOTICE '%: Start %.', clock_timestamp(), area_row;
        RETURN QUERY
              SELECT COUNT(*) FROM jammerdal WHERE id IN ('id-0', 'id-1', 'id-2', 'id-3', 'id-4', 'id-5', 'id-6', 'id-7', 'id-8', 'id-9', 'id-10', 'id-11', 'id-12', 'id-13', 'id-14', 'id-15', 'id-16', 'id-17', 'id-18', 'id-19', 'id-20', 'id-21', 'id-22', 'id-23', 'id-24', 'id-25'); --id = ANY(jammerdal_ids);
        --RAISE NOTICE '%: End %.', clock_timestamp(), area_row;
        RAISE NOTICE '%: Duration is %.', area_row, clock_timestamp()-start;
    END LOOP;
    RAISE NOTICE '%: All done.', clock_timestamp();
END
$$ LANGUAGE plpgsql;

SELECT * FROM jegfatterintet();

输出变为:

CREATE FUNCTION
NOTICE:  (1): Duration is 00:00:00.028254.
NOTICE:  (2): Duration is 00:00:00.001768.
NOTICE:  (3): Duration is 00:00:00.001512.
NOTICE:  (4): Duration is 00:00:00.001426.
NOTICE:  (5): Duration is 00:00:00.001523.
NOTICE:  (6): Duration is 00:00:00.001389.
NOTICE:  (7): Duration is 00:00:00.001363.
NOTICE:  (8): Duration is 00:00:00.001364.
NOTICE:  (9): Duration is 00:00:00.001466.
NOTICE:  (10): Duration is 00:00:00.001454.
NOTICE:  2021-09-07 11:25:46.635762+00: All done.
 c 
---
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
(10 rows)

谁能给我解释一下吗?

编辑:

explain (ANALYZE, BUFFERS) select id from jammerdal where id = any('{id-0,id-1,id-2,id-3,id-4,id-5,id-6,id-7,id-8,id-9,id-10,id-11,id-12,id-13,id-14,id-15,id-16,id-17,id-18,id-19,id-20,id-21,id-22,id-23,id-24,id-25}');
                                                     QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Foreign Scan on jammerdal  (cost=100.00..62578.95 rows=26 width=37) (actual time=0.718..0.719 rows=0 loops=1)
 Planning Time: 0.153 ms
 Execution Time: 1.101 ms
(3 rows)

“外国”数据库中还有一个 EXPLAIN 运行:

explain (ANALYZE, BUFFERS) select id from jammerdal where id = any('{id-0,id-1,id-2,id-3,id-4,id-5,id-6,id-7,id-8,id-9,id-10,id-11,id-12,id-13,id-14,id-15,id-16,id-17,id-18,id-19,id-20,id-21,id-22,id-23,id-24,id-25}');
                                                                                        QUERY PLAN                                                                                        
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Only Scan using jammerdal_pkey on jammerdal  (cost=0.42..219.41 rows=26 width=37) (actual time=0.286..0.290 rows=0 loops=1)
   Index Cond: (id = ANY ('{id-0,id-1,id-2,id-3,id-4,id-5,id-6,id-7,id-8,id-9,id-10,id-11,id-12,id-13,id-14,id-15,id-16,id-17,id-18,id-19,id-20,id-21,id-22,id-23,id-24,id-25}'::text[]))
   Heap Fetches: 0
   Buffers: shared hit=81
 Planning:
   Buffers: shared hit=205
 Planning Time: 2.111 ms
 Execution Time: 0.423 ms
(8 rows)

顺便说一句

ANALYZE jammerdal;

没有效果。

编辑 2: 问题显然是 fdw table 不使用 id...

上的索引
ALTER SERVER testdb  OPTIONS (ADD use_remote_estimate 'true');

成功了!

由于查询在一个函数中,PostgreSQL 缓存了执行计划。这是根据特殊的启发式方法工作的:

  • 对于前五次执行,PostgreSQL 生成一个使用实际参数值的“自定义计划”(jammerdal_ids)

  • 在第六次执行时,PostgreSQL 检查估计“通用计划”(忽略参数值)是否也能执行

  • 如果是,则从第六次执行开始使用通用计划,节省计划时间

在你的情况下,通用计划显然很糟糕。

由于您没有显示 EXPLAIN (ANALYZE, BUFFERS) 输出,我们只能猜测原因。但一个很好的猜测是你忘记了 ANALYZE 外国 table 并且统计数据不好。所以用

ANALYZE jammerdal;

您应该会注意到有所改善。

(请注意,PostgreSQL 不会自动收集外部 table 的统计信息。)