使用 ORDER BY 和 JSONB 数据时,本地和外部 table 的 Postgres FDW 连接查询速度很慢
Postgres FDW join query on local and foreign table is slow with ORDER BY and JSONB data
我有两个 table,本地 table debtors
和外国 table debtor_registry
。我正在使用 PostgreSQL v13。
我的问题是每当我尝试以下查询时,获取 1000 条记录需要 14 秒。
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
令我惊讶的是,每当我从查询中删除 ORDER BY
子句时,它变得更快只需要 194ms 1000 条记录。
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;
此外,另一种情况是,如果我从查询中删除作为 JSONB 字段的 settings
,并保留 ORDER BY
子句。获取1000条记录只用了101ms
SELECT
debtors.id,
debtors.name
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
我怀疑这可能与我尝试获取的数据量有关。
如果 settings
JSONB 字段 ORDER BY name
和 LIMIT 1000
在查询中,则这是 EXPLAIN ANALYZE VERBOSE
结果:
Limit (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
-> Sort (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Sort Key: debtors.name
Sort Method: external merge Disk: 82752kB
-> Hash Join (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
-> Foreign Scan on public.debtor_registry (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
Remote SQL: SELECT id, settings FROM public.company_debtor
-> Hash (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Buckets: 16384 Batches: 1 Memory Usage: 1108kB
-> Seq Scan on public.debtors (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
Functions: 10
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms
如何在不省略 settings
字段和 ORDER BY name
子句和 LIMIT 1000
的情况下使第一个查询更快?
UPDATE
我也发现了这个类似的question但是答案并没有解决我的问题。由于我们的排序是动态的,并且我们根据前端客户端请求构建查询。
将 use_remote_estimate
设置为 'true' 也无济于事。 :(
尝试
with t as materialized
(
SELECT -- your second query as-is
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;
即保留快速第二个查询的计划,然后对结果集进行排序。
如果您的 Postgresql 版本在 12 之前,则忽略 materialized
,因为 CTE 总是具体化的。
第二个建议 - sort/limit 在本地,预先选择正确的记录,然后为仅 1000 条记录拉胖 debtor_registry.settings
。
with t as materialized
(
SELECT d.id, d.name, d.registry_uuid
FROM debtors d
ORDER BY d.name
LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;
我有两个 table,本地 table debtors
和外国 table debtor_registry
。我正在使用 PostgreSQL v13。
我的问题是每当我尝试以下查询时,获取 1000 条记录需要 14 秒。
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
令我惊讶的是,每当我从查询中删除 ORDER BY
子句时,它变得更快只需要 194ms 1000 条记录。
SELECT
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0;
此外,另一种情况是,如果我从查询中删除作为 JSONB 字段的 settings
,并保留 ORDER BY
子句。获取1000条记录只用了101ms
SELECT
debtors.id,
debtors.name
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
ORDER BY name LIMIT 1000 OFFSET 0;
我怀疑这可能与我尝试获取的数据量有关。
如果 settings
JSONB 字段 ORDER BY name
和 LIMIT 1000
在查询中,则这是 EXPLAIN ANALYZE VERBOSE
结果:
Limit (cost=114722.78..114725.28 rows=1000 width=57) (actual time=13712.125..14002.827 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
-> Sort (cost=114722.78..114725.63 rows=1140 width=57) (actual time=13703.171..13993.617 rows=1000 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Sort Key: debtors.name
Sort Method: external merge Disk: 82752kB
-> Hash Join (cost=896.60..114664.90 rows=1140 width=57) (actual time=14.889..917.360 rows=10550 loops=1)
Output: debtors.id, debtors.name, debtor_registry.settings
Hash Cond: (((debtor_registry.id)::character varying)::text = (debtors.registry_uuid)::text)
-> Foreign Scan on public.debtor_registry (cost=100.00..113832.74 rows=1137 width=48) (actual time=8.845..902.466 rows=10529 loops=1)
Output: debtor_registry.id, debtor_registry.company_id, debtor_registry.settings, debtor_registry.product
Remote SQL: SELECT id, settings FROM public.company_debtor
-> Hash (cost=664.60..664.60 rows=10560 width=62) (actual time=6.027..6.028 rows=10554 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Buckets: 16384 Batches: 1 Memory Usage: 1108kB
-> Seq Scan on public.debtors (cost=0.00..664.60 rows=10560 width=62) (actual time=0.019..4.726 rows=10560 loops=1)
Output: debtors.id, debtors.name, debtors.registry_uuid
Planning Time: 0.098 ms
JIT:
Functions: 10
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.609 ms, Inlining 0.000 ms, Optimization 0.674 ms, Emission 7.991 ms, Total 10.274 ms
Execution Time: 14007.113 ms
如何在不省略 settings
字段和 ORDER BY name
子句和 LIMIT 1000
的情况下使第一个查询更快?
UPDATE
我也发现了这个类似的question但是答案并没有解决我的问题。由于我们的排序是动态的,并且我们根据前端客户端请求构建查询。
将
use_remote_estimate
设置为 'true' 也无济于事。 :(
尝试
with t as materialized
(
SELECT -- your second query as-is
debtors.id,
debtors.name,
debtor_registry.settings
FROM debtors
INNER JOIN debtor_registry ON debtor_registry.id = debtors.registry_uuid
LIMIT 1000 OFFSET 0
)
select * from t ORDER BY name;
即保留快速第二个查询的计划,然后对结果集进行排序。
如果您的 Postgresql 版本在 12 之前,则忽略 materialized
,因为 CTE 总是具体化的。
第二个建议 - sort/limit 在本地,预先选择正确的记录,然后为仅 1000 条记录拉胖 debtor_registry.settings
。
with t as materialized
(
SELECT d.id, d.name, d.registry_uuid
FROM debtors d
ORDER BY d.name
LIMIT 1000 OFFSET 0
)
select t.id, t.name, debtor_registry.settings
FROM t INNER JOIN debtor_registry ON debtor_registry.id = t.registry_uuid
ORDER BY t.name;