在 oracle 中用 regexp_replace 替换定义模式后的字符
Replace characters after a defined pattern with regexp_replace in oracle
我想用 oracle 中的函数 regexp_replace 替换字符串中的单个字符。字符串中的替换应从定义的模式开始。
示例:
在字符串 "Heyho || HeyheyHo" 中,我将替换模式“||”后面的所有 "y" 个字符字符 "i"。应忽略出现在模式之前的字符。
字符串:
Heyho || HeyheyHo
替换后的字符串:
Heyho || HeiheiHo
对你来说真的很容易吗?
你不需要正则表达式;您可以使用 INSTR
、SUBSTR
和 REPLACE
来满足您的需求:
with test(s) as (
select 'Heyho || HeyheyHo' from dual
)
/* the query */
select s as input,
substr(s, 1, instr(s, '||')+1) ||
replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
from test
给出:
INPUT RESULT
----------------- --------------------
Heyho || HeyheyHo Heyho || HeiheiHo
工作原理:
select s as input,
substr(s, 1, instr(s, '||')+1) beforeDelimiter,
substr(s, instr(s, '||')+2) afterDelimiter,
replace( substr(s, instr(s, '||')+2), 'y', 'i') afterDelimiterEdited,
substr(s, 1, instr(s, '||')+1) ||
replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
from test
给出:
INPUT BEFOREDELI AFTERDELIM AFTERDELIM RESULT
----------------- ---------- ---------- ---------- --------------------
Heyho || HeyheyHo Heyho || HeyheyHo HeiheiHo Heyho || HeiheiHo
如果字符串中出现多个||
,replace
将修改第一次出现后的字符。
根据 Mathguy 的评论,我不能说这个解决方案比正则表达式更快。
使用正则表达式的解决方案可能是:
select regexp_replace(s, 'y', 'i', instr(s, '||') ) as result
下面是对使用相同数据(500 万行)以相同方式创建的 2 个表的小型性能测试:
SQL> create table testA3(s) as
2 select regexp_replace(s, 'y', 'i', instr(s, '||') ) as result
3 from testA;
Table created.
Elapsed: 00:00:30.75
SQL> create table testB3(s) as
2 select substr(s, 1, instr(s, '||')+1) ||
3 replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
4 from testB;
Table created.
Elapsed: 00:00:14.82
这里的标准方法似乎更快;对于 3M 行的相同测试,正则表达式方法需要 18 秒,标准方法需要 7 秒。
测试当然不是详尽无遗的,结果可能会因许多因素而改变,但即使在需要许多标准操作才能获得的这种情况下,考虑标准方法也是正则表达式的一个很好的替代方法正则表达式的相同结果。
这是对 3M 行的完整测试;我做了一个 CREATE
和 2 个 INSERT
来避免 CONNECT BY
具有非常高的级别的内存问题。
此外,在 3M 和 5M 行测试之间,我删除了表并重新创建它们,以确保缓存不会影响结果。
SQL> create table testA(s) as
2 select 'Heyho || HeyheyHo' || level || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
Table created.
SQL> create table testB(s) as
2 select 'Heyho || HeyheyHo' || level || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
Table created.
SQL> insert into testB(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 1000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testA(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 1000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testB(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 2000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testA(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 2000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> select count(1), count(distinct s) from testA;
COUNT(1) COUNT(DISTINCTS)
---------- ----------------
3000000 3000000
SQL> select count(1), count(distinct s) from testB;
COUNT(1) COUNT(DISTINCTS)
---------- ----------------
3000000 3000000
SQL> set timing on
SQL> create table testA2(s) as
2 select regexp_replace(s, 'y', 'i', instr(s, '||')+2 ) as result
3 from testA;
Table created.
Elapsed: 00:00:17.66
SQL> create table testB2(s) as
2 select substr(s, 1, instr(s, '||')+1) ||
3 replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
4 from testB;
Table created.
Elapsed: 00:00:06.96
SQL>
这是使用 regexp_replace 的解决方案。第 4 个参数是起始位置。思前想后,我决定不跳过“+2”。不要偷懒和浪费周期来测试你知道不是目标角色的角色。
SQL> with tbl(str) as (
select 'Heyho || HeyheyHo' from dual
)
select str before,
regexp_replace(str, 'y', 'i', instr(str, '||')+2) after
from tbl;
BEFORE AFTER
----------------- -----------------
Heyho || HeyheyHo Heyho || HeiheiHo
SQL>
我想用 oracle 中的函数 regexp_replace 替换字符串中的单个字符。字符串中的替换应从定义的模式开始。
示例:
在字符串 "Heyho || HeyheyHo" 中,我将替换模式“||”后面的所有 "y" 个字符字符 "i"。应忽略出现在模式之前的字符。
字符串:
Heyho || HeyheyHo
替换后的字符串:
Heyho || HeiheiHo
对你来说真的很容易吗?
你不需要正则表达式;您可以使用 INSTR
、SUBSTR
和 REPLACE
来满足您的需求:
with test(s) as (
select 'Heyho || HeyheyHo' from dual
)
/* the query */
select s as input,
substr(s, 1, instr(s, '||')+1) ||
replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
from test
给出:
INPUT RESULT
----------------- --------------------
Heyho || HeyheyHo Heyho || HeiheiHo
工作原理:
select s as input,
substr(s, 1, instr(s, '||')+1) beforeDelimiter,
substr(s, instr(s, '||')+2) afterDelimiter,
replace( substr(s, instr(s, '||')+2), 'y', 'i') afterDelimiterEdited,
substr(s, 1, instr(s, '||')+1) ||
replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
from test
给出:
INPUT BEFOREDELI AFTERDELIM AFTERDELIM RESULT
----------------- ---------- ---------- ---------- --------------------
Heyho || HeyheyHo Heyho || HeyheyHo HeiheiHo Heyho || HeiheiHo
如果字符串中出现多个||
,replace
将修改第一次出现后的字符。
根据 Mathguy 的评论,我不能说这个解决方案比正则表达式更快。
使用正则表达式的解决方案可能是:
select regexp_replace(s, 'y', 'i', instr(s, '||') ) as result
下面是对使用相同数据(500 万行)以相同方式创建的 2 个表的小型性能测试:
SQL> create table testA3(s) as
2 select regexp_replace(s, 'y', 'i', instr(s, '||') ) as result
3 from testA;
Table created.
Elapsed: 00:00:30.75
SQL> create table testB3(s) as
2 select substr(s, 1, instr(s, '||')+1) ||
3 replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
4 from testB;
Table created.
Elapsed: 00:00:14.82
这里的标准方法似乎更快;对于 3M 行的相同测试,正则表达式方法需要 18 秒,标准方法需要 7 秒。
测试当然不是详尽无遗的,结果可能会因许多因素而改变,但即使在需要许多标准操作才能获得的这种情况下,考虑标准方法也是正则表达式的一个很好的替代方法正则表达式的相同结果。
这是对 3M 行的完整测试;我做了一个 CREATE
和 2 个 INSERT
来避免 CONNECT BY
具有非常高的级别的内存问题。
此外,在 3M 和 5M 行测试之间,我删除了表并重新创建它们,以确保缓存不会影响结果。
SQL> create table testA(s) as
2 select 'Heyho || HeyheyHo' || level || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
Table created.
SQL> create table testB(s) as
2 select 'Heyho || HeyheyHo' || level || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
Table created.
SQL> insert into testB(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 1000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testA(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 1000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testB(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 2000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> insert into testA(s)
2 select 'Heyho || HeyheyHo' || to_char(level + 2000000) || 'HeyheyHo'
3 from dual
4 connect by level <= 1000000;
1000000 rows created.
SQL> select count(1), count(distinct s) from testA;
COUNT(1) COUNT(DISTINCTS)
---------- ----------------
3000000 3000000
SQL> select count(1), count(distinct s) from testB;
COUNT(1) COUNT(DISTINCTS)
---------- ----------------
3000000 3000000
SQL> set timing on
SQL> create table testA2(s) as
2 select regexp_replace(s, 'y', 'i', instr(s, '||')+2 ) as result
3 from testA;
Table created.
Elapsed: 00:00:17.66
SQL> create table testB2(s) as
2 select substr(s, 1, instr(s, '||')+1) ||
3 replace( substr(s, instr(s, '||')+2), 'y', 'i') as result
4 from testB;
Table created.
Elapsed: 00:00:06.96
SQL>
这是使用 regexp_replace 的解决方案。第 4 个参数是起始位置。思前想后,我决定不跳过“+2”。不要偷懒和浪费周期来测试你知道不是目标角色的角色。
SQL> with tbl(str) as (
select 'Heyho || HeyheyHo' from dual
)
select str before,
regexp_replace(str, 'y', 'i', instr(str, '||')+2) after
from tbl;
BEFORE AFTER
----------------- -----------------
Heyho || HeyheyHo Heyho || HeiheiHo
SQL>