QUERY 在 Oracle 12.2 中使用 BIND 变量和 OR 条件的性能问题

Performance problem with QUERY using BIND variables and OR condition in Oracle 12.2

我很难理解为什么 Oracle CBO 在绑定变量是 OR 条件的一部分时的行为方式。

我的环境

Oracle 12.2 优于 Red Hat Linux 7

提示。我只是提供问题所在的查询的简化

$ sqlplus / as sysdba

SQL*Plus: Release 12.2.0.1.0 Production on Thu Jun 10 15:40:07 2021

Copyright (c) 1982, 2016, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> @test.sql
SQL> var loanIds varchar2(4000);
SQL> exec :loanIds := '100000018330,100000031448,100000013477,100000023115,100000022550,100000183669,100000247514,100000048198,100000268289';

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.00
SQL> SELECT
  2  whs.* ,
  3  count(*) over () AS TOTAL
  4  FROM ALFAMVS.WHS_LOANS whs
  5  WHERE
  6  ( nvl(:loanIds,'XX') = 'XX' or
  7              loanid IN (select regexp_substr(NVL(:loanIds,''),'[^,]+', 1, level) from dual
  8                                           connect by level <= regexp_count(:loanIds,'[^,]+'))
  9  )
 10  ;

7 rows selected.

Elapsed: 00:00:18.72

Execution Plan
----------------------------------------------------------
Plan hash value: 2980809427

------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |           |  6729 |  6748K|  2621   (1)| 00:00:01 |
|   1 |  WINDOW BUFFER                           |           |  6729 |  6748K|  2621   (1)| 00:00:01 |
|*  2 |   FILTER                                 |           |       |       |            |          |
|   3 |    TABLE ACCESS FULL                     | WHS_LOANS |   113K|   110M|  2621   (1)| 00:00:01 |
|*  4 |    FILTER                                |           |       |       |            |          |
|*  5 |     CONNECT BY WITHOUT FILTERING (UNIQUE)|           |       |       |            |          |
|   6 |      FAST DUAL                           |           |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NVL(:LOANIDS,'XX')='XX' OR  EXISTS (SELECT 0 FROM "DUAL" "DUAL" WHERE
              SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1 CONNECT BY LEVEL<=
              REGEXP_COUNT (:LOANIDS,'[^,]+')))
   4 - filter(SYS_OP_C2C( REGEXP_SUBSTR (NVL(:LOANIDS,''),'[^,]+',1,LEVEL))=:B1)
   5 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))


Statistics
----------------------------------------------------------
        288  recursive calls
        630  db block gets
       9913  consistent gets
          1  physical reads
     118724  redo size
      13564  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
     113003  sorts (memory)
          0  sorts (disk)
          7  rows processed

SQL> set autotrace off
SQL> select count(*) from ALFAMVS.WHS_LOANS ;

  COUNT(*)
----------
    113095

1 row selected.

Elapsed: 00:00:00.14

要点

  1. 我确实知道,如果我通过使用两个选择来更改 OR 表达式,那么 UNION ALL 将完美运行。问题是我用同样的方式完成了很多条件,所以 UNION ALL 不是我的解决方案。
  2. table 具有使用 FOR ALL COLUMNS SIZE AUTO 和 ESTIMATE PERCENT 10% 计算的最新统计信息。
  3. 动态 SQL 不是我的解决方案,因为查询是通过第三方软件调用的,该第三方软件使用 API Web 将结果转换为 JSON。
  4. 我能够以一种现在需要 19 秒的方式用 connect by level 重新表述正则表达式。之前需要 40 秒。
  5. table只有113K条记录,没有索引。
  6. 查询有 20 个此类条件,全部以相同方式编写,因为通过 API 触发查询的 Web 应用程序屏幕允许用户使用参数的任意组合或 none 完全没有。

如果我删除表达式 NVL(:loanIds,'XX') = 'XX' OR,查询需要 0.01 秒。为什么这个带有 BIND 的 OR 表达式让优化器如此头疼?

-- 更新--

我要感谢@Alex Poole 的建议,并与他分享第三种选择(删除正则表达式)非常有效。不过,如果能理解其中的原因,那就太好了。你有我最真诚的感谢。我用了一段时间,从来没有遇到过这个问题。此外,使用 regexp_like 的建议甚至比 regexp_substrconnect by level 的原始建议更好,但比完全不使用正则表达式的 [=18] 慢得多=]

原始查询

7 rows selected.

Elapsed: 00:00:36.29

新查询

7 rows selected.

Elapsed: 00:00:00.58

一旦内部谓词的 EXISTS 消失,查询就会像地狱一样快。

谢谢大家的意见!

出于某种原因,优化器根据执行计划重新评估您table中每一行的分层查询,然后使用exists() 以查看该行的 ID 是否在结果中。目前尚不清楚为什么 or 会导致这种情况。这可能是与 Oracle 提出的问题。

通过试验,我发现了三种至少可以部分解决该问题的方法 - 尽管我确信还有其他方法。第一种是将 CSV 扩展移动到 CTE,然后通过提示强制实现它:

WITH loanIds_cte (loanId) as (
  select /*+ materialize */ regexp_substr(:loanIds,'[^,]+', 1, level)
  from dual
  connect by level <= regexp_count(:loanIds,'[^,]+')
)
SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or
              loanid IN (select loanId from loanIds_cte)
  )
;

PLAN_TABLE_OUTPUT                                                                   
------------------------------------------------------------------------------------
Plan hash value: 3226738189
 
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                             |  1102 |  9918 |    11   (0)| 00:00:01 |
|   1 |  TEMP TABLE TRANSFORMATION     |                             |       |       |            |          |
|   2 |   LOAD AS SELECT               | SYS_TEMP_0FD9FD2A6_198A2E1A |       |       |            |          |
|*  3 |    CONNECT BY WITHOUT FILTERING|                             |       |       |            |          |
|   4 |     FAST DUAL                  |                             |     1 |       |     2   (0)| 00:00:01 |
|   5 |   WINDOW BUFFER                |                             |  1102 |  9918 |     9   (0)| 00:00:01 |
|*  6 |    FILTER                      |                             |       |       |            |          |
|   7 |     TABLE ACCESS FULL          | WHS_LOANS                   | 11300 |    99K|     9   (0)| 00:00:01 |
|*  8 |     VIEW                       |                             |     1 |  2002 |     2   (0)| 00:00:01 |
|   9 |      TABLE ACCESS FULL         | SYS_TEMP_0FD9FD2A6_198A2E1A |     1 |  2002 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   3 - filter(LEVEL<= REGEXP_COUNT (:LOANIDS,'[^,]+'))
   6 - filter(:LOANIDS IS NULL OR  EXISTS (SELECT 0 FROM  (SELECT /*+ CACHE_TEMP_TABLE ("T1") */ "C0" 
              "LOANID" FROM "SYS"."SYS_TEMP_0FD9FD2A6_198A2E1A" "T1") "LOANIDS_CTE" WHERE SYS_OP_C2C("LOANID")=:B1))
   8 - filter(SYS_OP_C2C("LOANID")=:B1)

这仍然对 exists() 进行奇怪的转换,但至少现在它正在查询具体化的 CTE,因此 connect by 查询仅评估一个。

或者您可以使用正则表达式将每个 loadId 值与完整字符串进行比较:

SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or 
    regexp_like(:loanIds, '(^|,)' || loanId || '(,|$)')
  )
;

PLAN_TABLE_OUTPUT                                                                   
------------------------------------------------------------------------------------
Plan hash value: 1622376598
 
--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |  1102 |  9918 |     9   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER     |           |  1102 |  9918 |     9   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| WHS_LOANS |  1102 |  9918 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(:LOANIDS IS NULL OR  REGEXP_LIKE 
              (:LOANIDS,SYS_OP_C2C(U'(^|,)'||"LOANID"||U'(,|$)')))

在我的测试中,这比 CTE 慢,因为正则表达式仍然很昂贵,而且你正在做 113k 个(仍然比 2 x 113k x 它们的元素数要好)。

或者您可以避免使用正则表达式并使用几个单独的比较:

SELECT
 whs.* ,
  count(*) over () AS TOTAL
  FROM WHS_LOANS whs
  WHERE
  ( :loanIds is null or 
    :loanIds like loanId || ',%' or
    :loanIds like '%,' || loanId or
    :loanIds like '%,' || loanId || ',%'
  )
;

PLAN_TABLE_OUTPUT                                                                    
------------------------------------------------------------------------------------
Plan hash value: 1622376598
 
--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |  2096 | 18864 |     9   (0)| 00:00:01 |
|   1 |  WINDOW BUFFER     |           |  2096 | 18864 |     9   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| WHS_LOANS |  2096 | 18864 |     9   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - filter(:LOANIDS IS NULL OR :LOANIDS LIKE 
              SYS_OP_C2C("LOANID"||U',%') OR :LOANIDS LIKE 
              SYS_OP_C2C(U'%,'||"LOANID") OR :LOANIDS LIKE 
              SYS_OP_C2C(U'%,'||"LOANID"||U',%'))

在我的有限测试中,这三个选项中速度最快的一个。但是可能会有更好更快的方法。


不太相关,但您似乎 运行 作为 SYS 这不是一个好主意,即使数据在另一个模式中也是如此;您的 loanId 列似乎是 nvarchar2 (来自 SYS_OP_C2C 调用),这对于可能是数字但无论如何似乎只有 ASCII 字符的东西来说似乎很奇怪; NVL(:loanIds,'') 什么都不做,因为 null 和空字符串在 Oracle 中是一样的;并且 nvl(:loanIds,'XX') = 'XX' 可以像 :loanIds is not null 那样完成,这避免了魔法值。