基于函数的索引没有提高查询性能
Function Based Index not improving query performance
我已经创建了视图,在这个视图中我添加了我需要的下面的 case 语句,我已经为它创建了基于函数的索引。该视图有 1900000 条记录。当我尝试执行视图时,运行 需要数小时,而且该视图的性能非常低。我不明白如何提高性能。
CREATE OR REPLACE VIEW
TST_AGG
(
ROOT) AS
Select
CASE
WHEN regexp_like(ticker, '\s.*\s')
THEN SUBSTR(ticker, 1, instr(ticker, ' ')-1)
WHEN regexp_like(ticker, '\s')
THEN
CASE
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '(P|C)$')
AND LENGTH(SUBSTR(ticker, 1, instr(ticker, ' ')-1)) >= 4
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-3)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '\w\d\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-5)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')), '\w\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-4)
ELSE SUBSTR(ticker, 1, instr(ticker, ' ')-1)
END
WHEN regexp_like(ticker, '(P|C)$')
AND LENGTH(ticker) >= 4
THEN SUBSTR(ticker, 1, LENGTH(ticker)-3)
WHEN regexp_like(ticker, '\w\d\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-5)
WHEN regexp_like(ticker, '\w\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-4)
ELSE ticker
END ) AS ROOT
FROM TTT_IMP
下面是我创建的基于函数的索引:
CREATE INDEX "IDX_ROOT" ON "TTT_IMP" (CASE WHEN REGEXP_LIKE ("TICKER",'\s.*\s') THEN SUBSTR("TICKER",1,INSTR("TICKER",' ')-1) WHEN REGEXP_LIKE ("TICKER",'\s') THEN CASE WHEN ( REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),'(P|C)$') AND LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))>=4) THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-3) WHEN REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),'\w\d\d\w\d$') THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-5) WHEN REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')),'\w\d\w\d$') THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-4) ELSE SUBSTR("TICKER",1,INSTR("TICKER",' ')-1) END WHEN ( REGEXP_LIKE ("TICKER",'(P|C)$') AND LENGTH("TICKER")>=4) THEN SUBSTR("TICKER",1,LENGTH("TICKER")-3) WHEN REGEXP_LIKE ("TICKER",'\w\d\d\w\d$') THEN SUBSTR("TICKER",1,LENGTH("TICKER")-5) WHEN REGEXP_LIKE ("TICKER",'\w\d\w\d$') THEN SUBSTR("TICKER",1,LENGTH("TICKER")-4) ELSE "TICKER" END );
索引可以在两种情况下使用。
1)减少选中的记录,即
SELECT ... FROM TST_AGG where ROOT = ...
2) 避免查询 table,并避免昂贵的计算
SELECT ROOT FROM TST_AGG
我假设是后者。如果优化器知道索引条目与 table 是一对一的,则它只能使用索引(代替 table)。因为它不知道该表达式是否可能 return NULL(未存储在索引中),所以除非您通知优化器,否则它无法进行直接交换。
因此
SELECT ROOT FROM TST_AGG
将没有机会使用索引,但是
SELECT ROOT FROM TST_AGG WHERE ROOT IS NOT NULL
应该可以。
另一件需要注意的事情是,我们可能会更改用于存储索引的表达式语法。所以检查 USER_IND_EXPRESSIONS,也许将该表达式放回视图的定义中。
我建议检查一下您的数据模型,正则表达式真的很难看。将相关信息直接存储在列中,而不是隐藏在 ticket
字符串中的某个地方。
无论如何,我建议创建一个虚拟列而不是视图。然后你可以在这个虚拟列上创建索引,它也应该被使用。将与此类似:
ALTER TABLE TTT_IMP ADD (ROOT VARCHAR2(20) GENERATED ALWAYS AS (
CAST(
CASE
WHEN regexp_like(ticker, '\s.*\s')
THEN SUBSTR(ticker, 1, instr(ticker, ' ')-1)
WHEN regexp_like(ticker, '\s')
THEN
CASE
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '(P|C)$')
AND LENGTH(SUBSTR(ticker, 1, instr(ticker, ' ')-1)) >= 4
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-3)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '\w\d\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-5)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')), '\w\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-4)
ELSE SUBSTR(ticker, 1, instr(ticker, ' ')-1)
END
WHEN regexp_like(ticker, '(P|C)$')
AND LENGTH(ticker) >= 4
THEN SUBSTR(ticker, 1, LENGTH(ticker)-3)
WHEN regexp_like(ticker, '\w\d\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-5)
WHEN regexp_like(ticker, '\w\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-4)
ELSE ticker
END
AS VARCHAR2(20))
) VIRTUAL);
为了补充@ConnorMcDonnald 的精彩回答,这里有一段来自文档的简介:
12.4.2 Disadvantages of Function-Based Indexes
If the index expression is a function invocation, then the function
return type cannot be constrained.
Because you cannot constrain the function return type with NOT NULL,
you must ensure that the query that uses the index cannot fetch NULL
values. Otherwise, the database performs a full table scan.
我已经创建了视图,在这个视图中我添加了我需要的下面的 case 语句,我已经为它创建了基于函数的索引。该视图有 1900000 条记录。当我尝试执行视图时,运行 需要数小时,而且该视图的性能非常低。我不明白如何提高性能。
CREATE OR REPLACE VIEW
TST_AGG
(
ROOT) AS
Select
CASE
WHEN regexp_like(ticker, '\s.*\s')
THEN SUBSTR(ticker, 1, instr(ticker, ' ')-1)
WHEN regexp_like(ticker, '\s')
THEN
CASE
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '(P|C)$')
AND LENGTH(SUBSTR(ticker, 1, instr(ticker, ' ')-1)) >= 4
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-3)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '\w\d\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-5)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')), '\w\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-4)
ELSE SUBSTR(ticker, 1, instr(ticker, ' ')-1)
END
WHEN regexp_like(ticker, '(P|C)$')
AND LENGTH(ticker) >= 4
THEN SUBSTR(ticker, 1, LENGTH(ticker)-3)
WHEN regexp_like(ticker, '\w\d\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-5)
WHEN regexp_like(ticker, '\w\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-4)
ELSE ticker
END ) AS ROOT
FROM TTT_IMP
下面是我创建的基于函数的索引:
CREATE INDEX "IDX_ROOT" ON "TTT_IMP" (CASE WHEN REGEXP_LIKE ("TICKER",'\s.*\s') THEN SUBSTR("TICKER",1,INSTR("TICKER",' ')-1) WHEN REGEXP_LIKE ("TICKER",'\s') THEN CASE WHEN ( REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),'(P|C)$') AND LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))>=4) THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-3) WHEN REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),'\w\d\d\w\d$') THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-5) WHEN REGEXP_LIKE (SUBSTR("TICKER",1,INSTR("TICKER",' ')),'\w\d\w\d$') THEN SUBSTR(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1),1,LENGTH(SUBSTR("TICKER",1,INSTR("TICKER",' ')-1))-4) ELSE SUBSTR("TICKER",1,INSTR("TICKER",' ')-1) END WHEN ( REGEXP_LIKE ("TICKER",'(P|C)$') AND LENGTH("TICKER")>=4) THEN SUBSTR("TICKER",1,LENGTH("TICKER")-3) WHEN REGEXP_LIKE ("TICKER",'\w\d\d\w\d$') THEN SUBSTR("TICKER",1,LENGTH("TICKER")-5) WHEN REGEXP_LIKE ("TICKER",'\w\d\w\d$') THEN SUBSTR("TICKER",1,LENGTH("TICKER")-4) ELSE "TICKER" END );
索引可以在两种情况下使用。
1)减少选中的记录,即
SELECT ... FROM TST_AGG where ROOT = ...
2) 避免查询 table,并避免昂贵的计算
SELECT ROOT FROM TST_AGG
我假设是后者。如果优化器知道索引条目与 table 是一对一的,则它只能使用索引(代替 table)。因为它不知道该表达式是否可能 return NULL(未存储在索引中),所以除非您通知优化器,否则它无法进行直接交换。
因此
SELECT ROOT FROM TST_AGG
将没有机会使用索引,但是
SELECT ROOT FROM TST_AGG WHERE ROOT IS NOT NULL
应该可以。
另一件需要注意的事情是,我们可能会更改用于存储索引的表达式语法。所以检查 USER_IND_EXPRESSIONS,也许将该表达式放回视图的定义中。
我建议检查一下您的数据模型,正则表达式真的很难看。将相关信息直接存储在列中,而不是隐藏在 ticket
字符串中的某个地方。
无论如何,我建议创建一个虚拟列而不是视图。然后你可以在这个虚拟列上创建索引,它也应该被使用。将与此类似:
ALTER TABLE TTT_IMP ADD (ROOT VARCHAR2(20) GENERATED ALWAYS AS (
CAST(
CASE
WHEN regexp_like(ticker, '\s.*\s')
THEN SUBSTR(ticker, 1, instr(ticker, ' ')-1)
WHEN regexp_like(ticker, '\s')
THEN
CASE
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '(P|C)$')
AND LENGTH(SUBSTR(ticker, 1, instr(ticker, ' ')-1)) >= 4
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-3)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')-1), '\w\d\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-5)
WHEN regexp_like(SUBSTR(ticker, 1, instr(ticker, ' ')), '\w\d\w\d$')
THEN SUBSTR(SUBSTR(ticker, 1, instr(ticker, ' ')-1), 1, LENGTH(SUBSTR(ticker, 1
, instr(ticker, ' ')-1))-4)
ELSE SUBSTR(ticker, 1, instr(ticker, ' ')-1)
END
WHEN regexp_like(ticker, '(P|C)$')
AND LENGTH(ticker) >= 4
THEN SUBSTR(ticker, 1, LENGTH(ticker)-3)
WHEN regexp_like(ticker, '\w\d\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-5)
WHEN regexp_like(ticker, '\w\d\w\d$')
THEN SUBSTR(ticker, 1, LENGTH(ticker)-4)
ELSE ticker
END
AS VARCHAR2(20))
) VIRTUAL);
为了补充@ConnorMcDonnald 的精彩回答,这里有一段来自文档的简介:
12.4.2 Disadvantages of Function-Based Indexes
If the index expression is a function invocation, then the function return type cannot be constrained.
Because you cannot constrain the function return type with NOT NULL, you must ensure that the query that uses the index cannot fetch NULL values. Otherwise, the database performs a full table scan.