两个索引上的 MERGE JOIN 仍然导致 SORT?
MERGE JOIN on two indexes still causing a SORT?
这是一个简化为连接两个索引的性能问题。采用以下设置:
CREATE TABLE ZZ_BASE AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_B_I ON ZZ_BASE(ID ASC);
CREATE TABLE ZZ_CHILD AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_C_I ON ZZ_CHILD(ID ASC);
-- As @Flado pointed out, the following is required so index scanning can be done
ALTER TABLE ZZ_BASE MODIFY (ID CONSTRAINT NN_B NOT NULL);
ALTER TABLE ZZ_CHILD MODIFY (ID CONSTRAINT NN_C NOT NULL); -- given the join below not mandatory.
现在我想 LEFT OUTER JOIN 这两个 tables 并且只输出已经索引的 ID 字段。
SELECT ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 4894 (2)| 00:00:30 |
|* 1 | HASH JOIN OUTER | | 1000K| 9765K| 16M| 4894 (2)| 00:00:30 |
| 2 | INDEX FAST FULL SCAN| ZZ_B_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
| 3 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
----------------------------------------------------------------------------------------
如您所见,不需要 table 访问,只需要索引访问。但是根据常识,HASH-joining并不是连接这两个索引的最佳方式。如果这两个 table 大得多,则必须创建一个非常大的散列 table。
一种更有效的方法是对两个索引进行 SORT-MERGE。
SELECT /*+ USE_MERGE(ZZ_BASE ZZ_CHILD) */ ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 4673 (4)| 00:00:29 |
| 4 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
-----------------------------------------------------------------------------------------
但似乎第二个索引已排序,即使它已经排序 ("If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes"1)
基本上,我想要的是一个使用 SORT-MERGE 连接并立即开始输出记录的查询,即:
- 没有 HASH 连接,因为它首先必须进行散列 table(如果存储在磁盘上,则 IO 开销)因此不会立即输出。
- no NESTED LOOP join which, although it would output
即刻,在索引较大的情况下,索引戳的复杂度为 log(N),非顺序索引读取的 IO 开销较大。
INDEX_ASC(或只是 INDEX)是您可能想要尝试的提示,以便将性能与真实数据进行比较。
我有点惊讶您对外部行源进行任何类型的索引扫描,因为 B*Tree 索引找不到 NULL 键并且 ZZ_BASE 没有 NOT NULL
约束。添加它并提示更多一点将使您按照 ZZ_C_I 索引的索引顺序进行完整扫描。不幸的是,这并没有为您节省 SORT JOIN
步骤,但至少它应该快得多 - O(n) - 因为数据已经排序。
alter table zz_base modify (id not null);
SELECT
/*+ leading(zz_base) USE_MERGE(ZZ_CHILD)
index_asc(zz_base (id)) index(zz_child (id)) */ ZZ_BASE.ID
FROM ZZ_BASE left outer join ZZ_CHILD on zz_base.id=zz_child.id;
此查询使用以下执行计划:
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 5983 (3)| 00:00:36 |
| 4 | INDEX FULL SCAN| ZZ_C_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
------------------------------------------------------------------------------------
这是一个简化为连接两个索引的性能问题。采用以下设置:
CREATE TABLE ZZ_BASE AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_B_I ON ZZ_BASE(ID ASC);
CREATE TABLE ZZ_CHILD AS SELECT dbms_random.random AS ID, DBMS_RANDOM.STRING('U',10) AS STR FROM DUAL CONNECT BY LEVEL <=1000000;
CREATE INDEX ZZ_C_I ON ZZ_CHILD(ID ASC);
-- As @Flado pointed out, the following is required so index scanning can be done
ALTER TABLE ZZ_BASE MODIFY (ID CONSTRAINT NN_B NOT NULL);
ALTER TABLE ZZ_CHILD MODIFY (ID CONSTRAINT NN_C NOT NULL); -- given the join below not mandatory.
现在我想 LEFT OUTER JOIN 这两个 tables 并且只输出已经索引的 ID 字段。
SELECT ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 4894 (2)| 00:00:30 |
|* 1 | HASH JOIN OUTER | | 1000K| 9765K| 16M| 4894 (2)| 00:00:30 |
| 2 | INDEX FAST FULL SCAN| ZZ_B_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
| 3 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
----------------------------------------------------------------------------------------
如您所见,不需要 table 访问,只需要索引访问。但是根据常识,HASH-joining并不是连接这两个索引的最佳方式。如果这两个 table 大得多,则必须创建一个非常大的散列 table。
一种更有效的方法是对两个索引进行 SORT-MERGE。
SELECT /*+ USE_MERGE(ZZ_BASE ZZ_CHILD) */ ZZ_BASE.ID
FROM ZZ_BASE
LEFT OUTER JOIN ZZ_CHILD ON (ZZ_BASE.ID = ZZ_CHILD.ID);
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 6931 (3)| 00:00:42 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 4673 (4)| 00:00:29 |
| 4 | INDEX FAST FULL SCAN| ZZ_C_I | 1000K| 4882K| | 948 (3)| 00:00:06 |
-----------------------------------------------------------------------------------------
但似乎第二个索引已排序,即使它已经排序 ("If an index exists, then the database can avoid sorting the first data set. However, the database always sorts the second data set, regardless of indexes"1)
基本上,我想要的是一个使用 SORT-MERGE 连接并立即开始输出记录的查询,即:
- 没有 HASH 连接,因为它首先必须进行散列 table(如果存储在磁盘上,则 IO 开销)因此不会立即输出。
- no NESTED LOOP join which, although it would output 即刻,在索引较大的情况下,索引戳的复杂度为 log(N),非顺序索引读取的 IO 开销较大。
INDEX_ASC(或只是 INDEX)是您可能想要尝试的提示,以便将性能与真实数据进行比较。
我有点惊讶您对外部行源进行任何类型的索引扫描,因为 B*Tree 索引找不到 NULL 键并且 ZZ_BASE 没有 NOT NULL
约束。添加它并提示更多一点将使您按照 ZZ_C_I 索引的索引顺序进行完整扫描。不幸的是,这并没有为您节省 SORT JOIN
步骤,但至少它应该快得多 - O(n) - 因为数据已经排序。
alter table zz_base modify (id not null);
SELECT
/*+ leading(zz_base) USE_MERGE(ZZ_CHILD)
index_asc(zz_base (id)) index(zz_child (id)) */ ZZ_BASE.ID
FROM ZZ_BASE left outer join ZZ_CHILD on zz_base.id=zz_child.id;
此查询使用以下执行计划:
------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 1 | MERGE JOIN OUTER | | 1000K| 9765K| | 8241 (3)| 00:00:50 |
| 2 | INDEX FULL SCAN | ZZ_B_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
|* 3 | SORT JOIN | | 1000K| 4882K| 22M| 5983 (3)| 00:00:36 |
| 4 | INDEX FULL SCAN| ZZ_C_I | 1000K| 4882K| | 2258 (2)| 00:00:14 |
------------------------------------------------------------------------------------