Oracle INSTR 精确匹配
Oracle INSTR exact match
我有以下查询
SQL> select * from RTECS_ABBREV ra
2 where instr(trim('100 mmol/plate (-S9)'), ra.abbrev) > 0;
ABBREV DEFINITION
------------------------------ --------------------------------------------------------------------------------
mmo Mutation in Micro-organism
mmol millimole
mol mole
S second
SQL>
我想要得到以下结果
SQL> select * from RTECS_ABBREV ra
2 where instr(trim('100 mmol/plate (-S9)'), ra.abbrev) > 0;
ABBREV DEFINITION
------------------------------ --------------------------------------------------------------------------------
mmol millimole
S second
SQL>
因为 "mmo" 和 "mol" 是 "mmol" 单词的一部分
更多....
看我有以下数据:
with abbr as
(
select 'mmo' as abbrev from dual union
select 'mmol' as abbrev from dual union
select 'mol' as abbrev from dual union
select 'ug' as abbrev from dual union
select 'mg' as abbrev from dual union
select 'ppm' as abbrev from dual union
select 'nmol' as abbrev from dual union
select 'nm' as abbrev from dual union
select 'ol' as abbrev from dual union
select 'S' as abbrev from dual
),
main_data as
(
select '24231' as id_, '10 ug/plate (-S9)' as data_ from dual union
select '24232' as id_, '1 pph' as data_ from dual union
select '24233' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24234' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24235' as id_, '1 pph' as data_ from dual union
select '24236' as id_, '19300 nmol/L (-S9)' as data_ from dual union
select '24237' as id_, '800 mg/L' as data_ from dual union
select '24238' as id_, '600 ppm/2H-C (-S9)' as data_ from dual union
select '24239' as id_, '500 mg/L (-S9)' as data_ from dual union
select '24240' as id_, '2000 ppm (-S9)' as data_ from dual union
select '24241' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24242' as id_, '1 pph (-S9)' as data_ from dual union
select '24243' as id_, 'ihl 2700 ppm' as data_ from dual union
select '24244' as id_, 'par 10 mmol/L' as data_ from dual union
select '24245' as id_, 'mul 1 pph/8H-C' as data_ from dual
)
select * from main_data
我需要在 "main_data.data_" 中将 "abbr.abbrev" 中出现的任何匹配词替换为另一个字符串(例如:"test")。
例如,对于“100 mmol/plate (-S9)”,我需要:
100 test/plate (-test9) but not,
100 testl/plate (-test9) or 100 testol/plate (-test9)
所以规则似乎是,替换 "abbr.abbrev" 中的整个单词匹配,如果字符串介于 () 之间,则替换任何匹配的字符
鉴于您的示例数据,我认为您需要如下内容:
SELECT * FROM main_data INNER JOIN abbr
ON REGEXP_LIKE(main_data.data_, '(^|\W)' || abbr.abbrev || '(\W|$)');
我使用上面的正则表达式,因为 Oracle 正则表达式不支持单词边界。在第一组中,我正在检查字符串或 "non-word" 字符的开头(既不是字母数字也不是下划线 _
)。在后一个(结束)组中,我正在检查字符串或非单词字符的结尾。
令我印象深刻的是,如果您总是要对给定单位进行某种度量,那么检查字符串(锚 ^
)的开头并不是真正必要的。
如果您要进行替换,您需要将 REGEXP_REPLACE()
与上述正则表达式一起使用,而不仅仅是使用 REGEXP_LIKE()
.
我有以下查询
SQL> select * from RTECS_ABBREV ra
2 where instr(trim('100 mmol/plate (-S9)'), ra.abbrev) > 0;
ABBREV DEFINITION
------------------------------ --------------------------------------------------------------------------------
mmo Mutation in Micro-organism
mmol millimole
mol mole
S second
SQL>
我想要得到以下结果
SQL> select * from RTECS_ABBREV ra
2 where instr(trim('100 mmol/plate (-S9)'), ra.abbrev) > 0;
ABBREV DEFINITION
------------------------------ --------------------------------------------------------------------------------
mmol millimole
S second
SQL>
因为 "mmo" 和 "mol" 是 "mmol" 单词的一部分
更多....
看我有以下数据:
with abbr as
(
select 'mmo' as abbrev from dual union
select 'mmol' as abbrev from dual union
select 'mol' as abbrev from dual union
select 'ug' as abbrev from dual union
select 'mg' as abbrev from dual union
select 'ppm' as abbrev from dual union
select 'nmol' as abbrev from dual union
select 'nm' as abbrev from dual union
select 'ol' as abbrev from dual union
select 'S' as abbrev from dual
),
main_data as
(
select '24231' as id_, '10 ug/plate (-S9)' as data_ from dual union
select '24232' as id_, '1 pph' as data_ from dual union
select '24233' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24234' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24235' as id_, '1 pph' as data_ from dual union
select '24236' as id_, '19300 nmol/L (-S9)' as data_ from dual union
select '24237' as id_, '800 mg/L' as data_ from dual union
select '24238' as id_, '600 ppm/2H-C (-S9)' as data_ from dual union
select '24239' as id_, '500 mg/L (-S9)' as data_ from dual union
select '24240' as id_, '2000 ppm (-S9)' as data_ from dual union
select '24241' as id_, '100 mmol/plate (-S9)' as data_ from dual union
select '24242' as id_, '1 pph (-S9)' as data_ from dual union
select '24243' as id_, 'ihl 2700 ppm' as data_ from dual union
select '24244' as id_, 'par 10 mmol/L' as data_ from dual union
select '24245' as id_, 'mul 1 pph/8H-C' as data_ from dual
)
select * from main_data
我需要在 "main_data.data_" 中将 "abbr.abbrev" 中出现的任何匹配词替换为另一个字符串(例如:"test")。
例如,对于“100 mmol/plate (-S9)”,我需要:
100 test/plate (-test9) but not,
100 testl/plate (-test9) or 100 testol/plate (-test9)
所以规则似乎是,替换 "abbr.abbrev" 中的整个单词匹配,如果字符串介于 () 之间,则替换任何匹配的字符
鉴于您的示例数据,我认为您需要如下内容:
SELECT * FROM main_data INNER JOIN abbr
ON REGEXP_LIKE(main_data.data_, '(^|\W)' || abbr.abbrev || '(\W|$)');
我使用上面的正则表达式,因为 Oracle 正则表达式不支持单词边界。在第一组中,我正在检查字符串或 "non-word" 字符的开头(既不是字母数字也不是下划线 _
)。在后一个(结束)组中,我正在检查字符串或非单词字符的结尾。
令我印象深刻的是,如果您总是要对给定单位进行某种度量,那么检查字符串(锚 ^
)的开头并不是真正必要的。
如果您要进行替换,您需要将 REGEXP_REPLACE()
与上述正则表达式一起使用,而不仅仅是使用 REGEXP_LIKE()
.