加入消除在带有子查询的 Oracle 中不起作用

Join elimination not working in Oracle with sub queries

我能够让联接消除适用于一对一关系等简单情况,但不适用于稍微复杂一些的情况。 最终我想尝试锚点建模,但首先我需要找到解决这个问题的方法。我使用的是 Oracle 12c 企业版 12.1.0.2.0.

我的测试用例的 DDL:

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk primary key(product_id, from_date)
);

一些示例数据:

insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price values(1, date '2016-01-01', 10);
insert into product_price values(1, date '2016-02-01', 8);
insert into product_price values(1, date '2016-05-01', 5);

insert into product_price values(2, date '2016-02-01', 5);

insert into product_price values(4, date '2016-01-01', 10);

commit;

5NF 观点

第一个视图无法编译 - 它因 ORA-01799 而失败:列不能外连接到子查询。不幸的是,当我查看锚点建模的在线示例时,这就是大多数历史化视图的定义方式...

create view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )
     left join product_price pp on(
          pp.product_id = p.product_id
      and pp.from_date  = (select max(pp2.from_date) 
                             from product_price pp2 
                            where pp2.product_id = pp.product_id)
     );

以下是我修复它的尝试。当将此视图与 product_id 的简单 select 一起使用时,Oracle 设法消除了 product_color 但 而不是 product_price.

create view product_5nf as
   select product_id
         ,pc.color
         ,pp.price 
     from product p
     left join product_color pc using(product_id)
     left join (select pp1.product_id, pp1.price 
                  from product_price pp1
                 where pp1.from_date  = (select max(pp2.from_date) 
                                           from product_price pp2 
                                          where pp2.product_id = pp1.product_id)
              )pp using(product_id);

select product_id
  from product_5nf;

----------------------------------------------------------
| Id  | Operation             | Name             | Rows  |
----------------------------------------------------------
|   0 | SELECT STATEMENT      |                  |     4 |
|*  1 |  HASH JOIN OUTER      |                  |     4 |
|   2 |   INDEX FAST FULL SCAN| PRODUCT_PK       |     4 |
|   3 |   VIEW                |                  |     3 |
|   4 |    NESTED LOOPS       |                  |     3 |
|   5 |     VIEW              | VW_SQ_1          |     5 |
|   6 |      HASH GROUP BY    |                  |     5 |
|   7 |       INDEX FULL SCAN | PRODUCT_PRICE_PK |     5 |
|*  8 |     INDEX UNIQUE SCAN | PRODUCT_PRICE_PK |     1 |
----------------------------------------------------------

我找到的唯一解决方案是改用标量子查询,如下所示:

create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,(select pp.price
             from product_price pp
            where pp.product_id = p.product_id
              and pp.from_date = (select max(from_date)
                                    from product_price pp2
                                   where pp2.product_id = pp.product_id)) as price
     from product p
     left join product_color pc on(
          pc.product_id = p.product_id
     )

select product_id
  from product_5nf;

---------------------------------------------------
| Id  | Operation            | Name       | Rows  |
---------------------------------------------------
|   0 | SELECT STATEMENT     |            |     4 |
|   1 |  INDEX FAST FULL SCAN| PRODUCT_PK |     4 |
---------------------------------------------------

现在 Oracle 成功消除了 product_price table。但是,标量子查询的实现方式与连接不同,它们的执行方式根本不允许我在现实世界场景中获得任何 acceptable 性能。

TL;DR 我如何重写视图 product_5nf 以便 Oracle 成功地消除两个相关的 tables?

我无法取消价格连接,但如果您执行以下操作,至少可以减少对价格检查的单个索引的访问:

CREATE OR REPLACE view product_5nf as
select p.product_id
      ,pc.color
      ,pp.price 
 from product p
 left join product_color pc ON p.product_id = pc.product_id
 left join (select pp1.product_id, pp1.price 
              from (SELECT product_id,
                           price,
                           from_date,
                           max(from_date) OVER (PARTITION BY product_id) max_from_date
                    FROM   product_price) pp1
             where pp1.from_date = max_from_date) pp ON p.product_id = pp.product_id;

我认为你这里有两个问题。

首先,加入消除仅适用于某些特定情况(PK-PK、PK-FK 等)。这不是一般的事情,您可以 LEFT JOIN 任何行集,每个行集将 return 每个连接键值的单个行并让 Oracle 消除连接。

其次,即使 Oracle 足够先进,可以在任何 LEFT JOIN 上进行连接消除,它知道每个连接键值只能获得一行,Oracle 尚不支持 [=13= 上的连接消除] 基于复合键(Oracle 支持文档 887553.1 说这将在 R12.2 中出现)。

您可以考虑的一种解决方法是为每个 product_id 实现一个包含最后一行的视图。然后LEFT JOIN到物化视图。像这样:

create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

-- Add a VIRTUAL column to PRODUCT_PRICE so that we can get all the data for 
-- the latest row by taking the MAX() of this column.
alter table product_price add ( sortable_row varchar2(80) generated always as ( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0'))  virtual not null );

-- Create a MV snapshot so we can materialize a view having only the latest
-- row for each product_id and can refresh that MV fast on commit.
create materialized view log on product_price with sequence, primary key, rowid ( price  ) including new values;

-- Create the MV
create materialized view product_price_latest refresh fast on commit enable query rewrite as
SELECT product_id, max( lpad(product_id,10,'0') || to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,'0')) sortable_row
FROM   product_price
GROUP BY product_id;

-- Create a primary key on the MV, so we can do join elimination
alter table product_price_latest add constraint ppl_pk primary key ( product_id );

-- Insert the OP's test data
insert into product values(1);
insert into product values(2);
insert into product values(3);
insert into product values(4);

insert into product_color values(1, 'Red');
insert into product_color values(2, 'Green');

insert into product_price ( product_id, from_date, price ) values(1, date '2016-01-01', 10 );
insert into product_price ( product_id, from_date, price) values(1, date '2016-02-01', 8);
insert into product_price ( product_id, from_date, price) values(1, date '2016-05-01', 5);

insert into product_price ( product_id, from_date, price) values(2, date '2016-02-01', 5);

insert into product_price ( product_id, from_date, price) values(4, date '2016-01-01', 10);

commit;

-- Create the 5NF view using the materialized view
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,to_date(substr(ppl.sortable_row,11,14),'YYYYMMDDHH24MISS') from_date
         ,to_number(substr(ppl.sortable_row,25)) price 
     from product p
     left join product_color pc on pc.product_id = p.product_id
     left join product_price_latest ppl on ppl.product_id = p.product_id 
;

-- The plan for this should not include any of the unnecessary tables.
select product_id from product_5nf;

-- Check the plan
SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

------------------------------------------------
| Id  | Operation        | Name       | E-Rows |
------------------------------------------------
|   0 | SELECT STATEMENT |            |        |
|   1 |  INDEX FULL SCAN | PRODUCT_PK |      1 |
------------------------------------------------

Now Oracle sucessfully eliminates the product_price table. However, scalar sub queries are implemented differently than joins and they way they are executed simply doesn't allow me to get any acceptable performance in a real world scenario.

Oracle 12.1 中基于成本的优化器可以执行查询转换以取消嵌套标量子查询。因此,性能可能与您在问题中所追求的 LEFT JOIN 一样好。

诀窍是你必须稍微调整一下。

首先,确保标量子查询 returns max() 没有 group by,这样 CBO 就知道不可能获得超过一行。 (否则它不会取消嵌套)。

其次,您需要将 product_price 中的所有字段合并到一个标量子查询中,否则 CBO 将多次取消嵌套并加入 product_price

这是一个 Oracle 12.1 的测试用例,它说明了这种工作方式。

drop view product_5nf;
drop table product_color cascade constraints;
drop table product_price cascade constraints;
drop table product       cascade constraints;


create table product(
   product_id number not null
  ,constraint product_pk primary key(product_id)
);

create table product_color(
   product_id  number         not null references product
  ,color       varchar2(10)   not null
  ,constraint product_color_pk primary key(product_id)
);

create table product_price(
   product_id  number   not null references product
  ,from_date   date     not null
  ,price       number   not null
  ,constraint product_price_pk  primary key (product_id, from_date )
);

insert into product ( product_id ) SELECT rownum FROM dual connect by rownum <= 100000;

insert into product_color ( product_id, color ) SELECT rownum, dbms_random.string('a',8) color FROM DUAL connect by rownum <= 100000;

--delete from product_price;
insert into product_price ( product_id, from_date, price ) SELECT product_id, trunc(sysdate) + dbms_random.value(-3,3) from_date, floor(dbms_random.value(50,120)/10)*10 price from product cross join lateral ( SELECT rownum x FROM dual connect by rownum <= mod(product_id,5));

commit;

begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_COLOR' ); end; 
begin dbms_stats.gather_table_stats ( ownname => USER, tabname => 'PRODUCT_PRICE' ); end; 

commit;

alter table product_price add ( composite_column varchar2(80) generated always as ( to_char(from_date,'YYYYMMDDHH24MISS') || lpad(price,10,0)) virtual );

create or replace view product_5nf as
   select d.product_id, d.color, to_date(substr(d.product_date_price,1,14),'YYYYMMDDHH24MISS') from_date, to_number(substr(d.product_date_price,-10)) price 
from 
(    select p.product_id
         ,pc.color
         ,( SELECT max(composite_column)  FROM product_price pp WHERE pp.product_id = p.product_id AND pp.from_date = ( SELECT max(pp2.from_date) FROM product_price pp2 WHERE pp2.product_id = pp.product_id ) ) product_date_price
     from product p
     left join product_color pc on pc.product_id = p.product_id )  d
;

select product_id from product_5nf;

----------------------------------------------
| Id  | Operation         | Name    | E-Rows |
----------------------------------------------
|   0 | SELECT STATEMENT  |         |        |
|   1 |  TABLE ACCESS FULL| PRODUCT |    100K|
----------------------------------------------

select * from product_5nf;

SELECT *
FROM   TABLE (DBMS_XPLAN.display_cursor (null, null,
                                         'ALLSTATS LAST'));

--------------------------------------------------------------------------------------
| Id  | Operation                | Name          | E-Rows |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |               |        |       |       |          |
|*  1 |  HASH JOIN RIGHT OUTER   |               |    100K|  8387K|  3159K| 8835K (0)|
|   2 |   VIEW                   | VW_SSQ_2      |      2 |       |       |          |
|   3 |    HASH GROUP BY         |               |      2 |    13M|  2332K|   12M (0)|
|   4 |     VIEW                 | VM_NWVW_3     |      2 |       |       |          |
|*  5 |      FILTER              |               |        |       |       |          |
|   6 |       HASH GROUP BY      |               |      2 |    23M|  5055K|   20M (0)|
|*  7 |        HASH JOIN         |               |    480K|    12M|  4262K|   17M (0)|
|   8 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|   9 |         TABLE ACCESS FULL| PRODUCT_PRICE |    220K|       |       |          |
|* 10 |   HASH JOIN OUTER        |               |    100K|  5918K|  3056K| 5847K (0)|
|  11 |    TABLE ACCESS FULL     | PRODUCT       |    100K|       |       |          |
|  12 |    TABLE ACCESS FULL     | PRODUCT_COLOR |    100K|       |       |          |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ITEM_2"="P"."PRODUCT_ID")
   5 - filter("PP"."FROM_DATE"=MAX("PP2"."FROM_DATE"))
   7 - access("PP2"."PRODUCT_ID"="PP"."PRODUCT_ID")
  10 - access("PC"."PRODUCT_ID"="P"."PRODUCT_ID")

好的,我正在回答我自己的问题。此答案中的信息对 Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 有效,但可能不会 更新版本。 不要为这个答案投票,因为它没有回答问题。

由于当前版本的特定限制(如 Mathew McPeak 所述),根本不可能让 Oracle 完全消除底层 5NF 视图中不必要的连接。限制是 无法在基于复合键的左联接上消除联接

任何解决此限制的尝试似乎都会引入重复或更新异常。接受的答案演示了如何通过使用物化视图并从而复制数据来克服优化器中的这种限制。这个答案展示了如何解决重复较少但更新异常的问题。

此解决方法基于您可以在唯一索引中使用可为空的列这一事实。我们将 null 用于所有历史版本,而实际 product_id 用于引用带有外键的产品 table 的最新版本。

alter table product_price add(
   latest_id number
  ,constraint product_price_uk  unique(latest_id)
  ,constraint product_price_fk2 foreign key(latest_id) references product(product_id)
  ,constraint product_price_chk check(latest_id = product_id)
);

-- One-time update of existing data
update product_price a
   set a.latest_id = a.product_id
 where from_date = (select max(from_date) 
                      from product_price b 
                     where a.product_id = b.product_id);   

PRODUCT_ID FROM_DATE       PRICE  LATEST_ID
---------- ---------- ---------- ----------
         1 2016-01-01         10       null
         1 2016-02-01          8       null
         1 2016-05-01          5          1
         2 2016-02-01          5          2
         4 2016-01-01         10          4

-- New view definition             
create or replace view product_5nf as
   select p.product_id
         ,pc.color
         ,pp.price
     from product p
     left join product_color pc on(pc.product_id = p.product_id)
     left join product_price pp on(pp.latest_id  = p.product_id);

当然,现在latest_id要手动维护了...每当插入新记录时,必须先将旧记录更新为null。

这种方法有两个好处。首先,Oracle 能够完全删除不必要的连接。其次,连接不作为标量子查询执行。

SQL> select count(*) from product_5nf;

---------------------------------------
| Id  | Operation        | Name       |
---------------------------------------
|   0 | SELECT STATEMENT |            |
|   1 |  SORT AGGREGATE  |            |
|   2 |   INDEX FULL SCAN| PRODUCT_PK |
---------------------------------------

Oracle 认识到可以在不触及基数的情况下解析计数 table。并且没有看到不必要的连接...

SQL> select product_id, price from product_5nf;

---------------------------------------------------------
| Id  | Operation                    | Name             |
---------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |
|*  1 |  HASH JOIN OUTER             |                  |
|   2 |   INDEX FULL SCAN            | PRODUCT_PK       |
|   3 |   TABLE ACCESS BY INDEX ROWID| PRODUCT_PRICE    |
|*  4 |    INDEX FULL SCAN           | PRODUCT_PRICE_UK |
---------------------------------------------------------

Oracle 认识到我们必须加入 product_price 才能获得价格列。而且product_color也不见了……

SQL> select * from product_5nf;

----------------------------------------------------------
| Id  | Operation                     | Name             |
----------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |
|*  1 |  HASH JOIN OUTER              |                  |
|   2 |   NESTED LOOPS OUTER          |                  |
|   3 |    INDEX FULL SCAN            | PRODUCT_PK       |
|   4 |    TABLE ACCESS BY INDEX ROWID| PRODUCT_COLOR    |
|*  5 |     INDEX UNIQUE SCAN         | PRODUCT_COLOR_PK |
|   6 |   TABLE ACCESS BY INDEX ROWID | PRODUCT_PRICE    |
|*  7 |    INDEX FULL SCAN            | PRODUCT_PRICE_UK |
----------------------------------------------------------

这里 Oracle 必须具体化所有连接,因为所有列都被引用了。

[我不知道 ANTI-JOIN 是否算作 Oracle 中的子查询],但 not exists 技巧通常是避免聚合子查询的一种方法:

CREATE VIEW product_5nfa as
   SELECT p.product_id
         ,pc.color
         ,pp.price
     FROM product p
     LEFT JOIN product_color pc
        ON pc.product_id = p.product_id
     LEFT join product_price pp
        ON pp.product_id = p.product_id
        AND NOT EXISTS ( SELECT * FROM product_price pp2
            WHERE pp2.product_id = pp.product_id
            AND pp2.from_date  > pp.from_date
            )   
     ;

来自 OP 的评论:视图已创建,但 Oracle 仍然无法删除连接。这是执行计划。

select count(*) from product_5nfa;

-------------------------------------------------
| Id  | Operation            | Name             |
-------------------------------------------------
|   0 | SELECT STATEMENT     |                  |
|   1 |  SORT AGGREGATE      |                  |
|   2 |   NESTED LOOPS OUTER |                  |
|   3 |    INDEX FULL SCAN   | PRODUCT_PK       |
|   4 |    VIEW              |                  |
|   5 |     NESTED LOOPS ANTI|                  |
|*  6 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
|*  7 |      INDEX RANGE SCAN| PRODUCT_PRICE_PK |
-------------------------------------------------