使用 ORDER BY 和 JSONB 数据时,本地和外部 table 的 Postgres FDW 连接查询速度很慢

Postgres FDW join query on local and foreign table is slow with ORDER BY and JSONB data

我有两个 table,本地 table debtors 和外国 table debtor_registry。我正在使用 PostgreSQL v13。

我的问题是每当我尝试以下查询时,获取 1000 条记录需要 14 秒

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

令我惊讶的是,每当我从查询中删除 ORDER BY 子句时,它变得更快只需要 194ms 1000 条记录。

SELECT 
    debtors.id,
    debtors.name,
    debtor_registry.settings
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;

此外,另一种情况是,如果我从查询中删除作为 JSONB 字段的 settings,并保留 ORDER BY 子句。获取1000条记录只用了101ms

SELECT 
    debtors.id,
    debtors.name
FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;

我怀疑这可能与我尝试获取的数据量有关。

如果 settings JSONB 字段 ORDER BY nameLIMIT 1000 在查询中,则这是 EXPLAIN ANALYZE VERBOSE 结果:

Limit  (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
  Output: debtors.id, debtors.name, debtor_registry.settings
  ->  Sort  (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
        Output: debtors.id, debtors.name, debtor_registry.settings
        Sort Key: debtors.name
        Sort Method: external merge  Disk: 82752kB
        ->  Hash Join  (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
              Output: debtors.id, debtors.name, debtor_registry.settings
              Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
              ->  Foreign Scan on public.debtor_registry  (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
                    Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
                    Remote SQL: SELECT id, settings FROM public.company_debtor
              ->  Hash  (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
                    Output: debtors.id, debtors.name, debtors.registry_uuid
                    Buckets: 16384  Batches: 1  Memory Usage: 1108kB
                    ->  Seq Scan on public.debtors  (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
                          Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
  Functions: 10
  Options: Inlining false, Optimization false, Expressions true, Deforming true
  Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms

如何在不省略 settings 字段和 ORDER BY name 子句和 LIMIT 1000 的情况下使第一个查询更快?

UPDATE

  1. 我也发现了这个类似的question但是答案并没有解决我的问题。由于我们的排序是动态的,并且我们根据前端客户端请求构建查询。

  2. use_remote_estimate 设置为 'true' 也无济于事。 :(

尝试

with t as materialized
(
 SELECT -- your second query as-is
    debtors.id,
    debtors.name,
    debtor_registry.settings
 FROM debtors
    INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
 LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;

即保留快速第二个查询的计划,然后对结果集进行排序。 如果您的 Postgresql 版本在 12 之前,则忽略 materialized,因为 CTE 总是具体化的。

第二个建议 - sort/limit 在本地,预先选择正确的记录,然后为仅 1000 条记录拉胖 debtor_registry.settings

with t as materialized
(
 SELECT d.id, d.name, d.registry_uuid 
 FROM debtors d
 ORDER BY d.name
 LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;