Insert-select 在添加 limit 子句时得到更好的计划
Insert-select gets a better plan when limit clause added
这是我所在的服务器运行
select version();
version
---------------------------------------------------------------------------
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)
我开始写 select(ext.t_event 和 ext.t_event_data 是 oracle_fdw(1.1 版)从远程获取的两个外来 table甲骨文数据库)
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
获取整个记录集(3600 条记录)大约需要 10 秒。
但后来我把 select 变成了插入 select
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
我被迫终止查询,它已经 运行 超过 30 分钟了!
经过几个小时的挣扎和绝望的尝试,我决定试试这个
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data
limit 5000;
并且...在 20 秒内出人意料,我将整个记录集存储在 stg_data。
为了更好地理解差异,我决定分析计划。
SELECT不限
Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ r1."ID_DATA",
r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", r2."I_INOUT",
r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA") AND
(r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST ('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP)))
AND (r1."ID_DEVICE" = 2749651))
SELECT 有限制
Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT",
r2."I_INOUT", r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER
JOIN "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA")
AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" < (CAST ('2019-01-17
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."ID_DEVICE" = 2749651))
所以它基本上向 Oracle 发送相同的查询,并在获取完成后立即在本地应用 FILTER。
INSER-SELECT 计划看起来一样吗?不!
INSERT_SELECT 限制
Insert on stg_data_hist (cost=10000.00..20010.00 rows=1000 width=548)
-> Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE",
r1."DATE_EVENT", r2."I_INOUT", r2."VALUE" FROM
("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" =
r2."ID_DATA") AND (r1."DATE_EVENT" >= (CAST ('2019-01-16
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r1."ID_DEVICE" = 2749651))
INSERT-SELECT 无 LIMIT 子句
Insert on stg_data_hist (cost=30012.50..40190.00 rows=5000 width=548)
-> Hash Join (cost=30012.50..40190.00 rows=5000 width=548)
Hash Cond: (te.id_data = ted.id_data)
-> Foreign Scan on t_event te (cost=10000.00..20000.00 rows=1000 width=28)
Oracle query: SELECT /*93379c271b3f1bc08a1dbb94fb89f739*/
r3."ID_DATA", r3."ID_DEVICE", r3."DATE_WRITE", r3."DATE_EVENT"
FROM "DISPATCH"."T_EVENT" r3 WHERE (r3."DATE_EVENT" >=
(CAST ('2019-01-16 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r3."DATE_EVENT" < (CAST ('2019-01-17 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r3."ID_DEVICE" = 2749651)
-> Hash (cost=20000.00..20000.00 rows=1000 width=528)
-> Foreign Scan on t_event_data ted
(cost=10000.00..20000.00 rows=1000 width=528)
Oracle query: SELECT /*21c8741f2fa8a8d13d037c3191e8ac96*/
r4."ID_DATA", r4."I_INOUT", r4."VALUE" FROM
"DISPATCH"."T_EVENT_DATA" r4
这就解释了为什么它比另一个花费的时间更长。它从一个外部 table 检索日期过滤的记录,从第二个外部 table 检索完整集并在本地进行连接。这将需要很长时间!这是几百万条记录与几千条记录。
最后是我的两个问题
1) 我想要第一个计划,但要去掉 LIMIT 子句(让我脊背发凉 :-))。你会怎么做?除了 join 子句之外,我没有办法对 ext.t_event_data 应用过滤器。
2) 为什么两个 INSERT-SELECT 计划看起来如此不同,尽管两个 SELECT 计划看起来如此相似?
感谢阅读,祝您有愉快的一天
计划者似乎认为无论哪种方式它只会得到几千行,这显然有很大差距,请确保外国 table 的统计数据在 运行 之前更新 'ANALYZE ext.t_event' 和 ext.t_event_data 相同,因为:
https://github.com/laurenz/oracle_fdw
PostgreSQL will not automatically gather statistics for foreign tables with the autovacuum daemon.
Keep in mind that analyzing an Oracle foreign table will result in a full sequential table scan. You can use the table option sample_percent to speed this up by using only a sample of the Oracle table.
联接 是 在 select 情况下下推到 Oracle,在插入情况下如果使用限制,所以我能看到不使用它的唯一原因insert without limit 是缺少精确的 table 统计数据。您可以尝试将插入查询重写为 CTE(出于明显的原因尚未测试此查询):
WITH foreign_data AS (
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data
)
insert into stg_data from foreign_data
您也可以尝试将查询重写为显式内部联接,而不是在 where 子句中加入联接条件 (te.id_data=ted.id_data)。
这是我所在的服务器运行
select version();
version
---------------------------------------------------------------------------
PostgreSQL 10.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
(1 row)
我开始写 select(ext.t_event 和 ext.t_event_data 是 oracle_fdw(1.1 版)从远程获取的两个外来 table甲骨文数据库)
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
获取整个记录集(3600 条记录)大约需要 10 秒。
但后来我把 select 变成了插入 select
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data;
我被迫终止查询,它已经 运行 超过 30 分钟了!
经过几个小时的挣扎和绝望的尝试,我决定试试这个
insert into stg_data
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data
limit 5000;
并且...在 20 秒内出人意料,我将整个记录集存储在 stg_data。
为了更好地理解差异,我决定分析计划。
SELECT不限
Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/ r1."ID_DATA",
r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT", r2."I_INOUT",
r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA") AND
(r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST ('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP)))
AND (r1."ID_DEVICE" = 2749651))
SELECT 有限制
Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE", r1."DATE_EVENT",
r2."I_INOUT", r2."VALUE" FROM ("DISPATCH"."T_EVENT" r1 INNER
JOIN "DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" = r2."ID_DATA")
AND (r1."DATE_EVENT" >= (CAST ('2019-01-16 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r1."DATE_EVENT" < (CAST ('2019-01-17
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."ID_DEVICE" = 2749651))
所以它基本上向 Oracle 发送相同的查询,并在获取完成后立即在本地应用 FILTER。
INSER-SELECT 计划看起来一样吗?不!
INSERT_SELECT 限制
Insert on stg_data_hist (cost=10000.00..20010.00 rows=1000 width=548)
-> Limit (cost=10000.00..20000.00 rows=1000 width=548)
-> Foreign Scan (cost=10000.00..20000.00 rows=1000 width=548)
Oracle query: SELECT /*eb01c463a72c3b6350f86f5db25e1353*/
r1."ID_DATA", r1."ID_DEVICE", r1."DATE_WRITE",
r1."DATE_EVENT", r2."I_INOUT", r2."VALUE" FROM
("DISPATCH"."T_EVENT" r1 INNER JOIN
"DISPATCH"."T_EVENT_DATA" r2 ON (r1."ID_DATA" =
r2."ID_DATA") AND (r1."DATE_EVENT" >= (CAST ('2019-01-16
00:00:00.000000 AD' AS TIMESTAMP))) AND (r1."DATE_EVENT" <
(CAST('2019-01-17 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r1."ID_DEVICE" = 2749651))
INSERT-SELECT 无 LIMIT 子句
Insert on stg_data_hist (cost=30012.50..40190.00 rows=5000 width=548)
-> Hash Join (cost=30012.50..40190.00 rows=5000 width=548)
Hash Cond: (te.id_data = ted.id_data)
-> Foreign Scan on t_event te (cost=10000.00..20000.00 rows=1000 width=28)
Oracle query: SELECT /*93379c271b3f1bc08a1dbb94fb89f739*/
r3."ID_DATA", r3."ID_DEVICE", r3."DATE_WRITE", r3."DATE_EVENT"
FROM "DISPATCH"."T_EVENT" r3 WHERE (r3."DATE_EVENT" >=
(CAST ('2019-01-16 00:00:00.000000 AD' AS TIMESTAMP))) AND
(r3."DATE_EVENT" < (CAST ('2019-01-17 00:00:00.000000 AD' AS
TIMESTAMP))) AND (r3."ID_DEVICE" = 2749651)
-> Hash (cost=20000.00..20000.00 rows=1000 width=528)
-> Foreign Scan on t_event_data ted
(cost=10000.00..20000.00 rows=1000 width=528)
Oracle query: SELECT /*21c8741f2fa8a8d13d037c3191e8ac96*/
r4."ID_DATA", r4."I_INOUT", r4."VALUE" FROM
"DISPATCH"."T_EVENT_DATA" r4
这就解释了为什么它比另一个花费的时间更长。它从一个外部 table 检索日期过滤的记录,从第二个外部 table 检索完整集并在本地进行连接。这将需要很长时间!这是几百万条记录与几千条记录。
最后是我的两个问题
1) 我想要第一个计划,但要去掉 LIMIT 子句(让我脊背发凉 :-))。你会怎么做?除了 join 子句之外,我没有办法对 ext.t_event_data 应用过滤器。
2) 为什么两个 INSERT-SELECT 计划看起来如此不同,尽管两个 SELECT 计划看起来如此相似?
感谢阅读,祝您有愉快的一天
计划者似乎认为无论哪种方式它只会得到几千行,这显然有很大差距,请确保外国 table 的统计数据在 运行 之前更新 'ANALYZE ext.t_event' 和 ext.t_event_data 相同,因为:
https://github.com/laurenz/oracle_fdw
PostgreSQL will not automatically gather statistics for foreign tables with the autovacuum daemon.
Keep in mind that analyzing an Oracle foreign table will result in a full sequential table scan. You can use the table option sample_percent to speed this up by using only a sample of the Oracle table.
联接 是 在 select 情况下下推到 Oracle,在插入情况下如果使用限制,所以我能看到不使用它的唯一原因insert without limit 是缺少精确的 table 统计数据。您可以尝试将插入查询重写为 CTE(出于明显的原因尚未测试此查询):
WITH foreign_data AS (
select
te.id_data,
te.id_device,
te.date_write,
te.date_event,
ted.i_inout,
ted.value
from ext.t_event te, ext.t_event_data ted
where te.id_device =2749651
and te.date_event >= '2019-01-16'and te.date_event < '2019-01-17'
and te.id_data=ted.id_data
)
insert into stg_data from foreign_data
您也可以尝试将查询重写为显式内部联接,而不是在 where 子句中加入联接条件 (te.id_data=ted.id_data)。