甲骨文SQLregexp_substrnon-capturing/optional组

Oracle SQL regexp_substr non-capturing/optional group

表达式:

Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*

正确生成以下匹配项:

Group 1.    24-30   494801
Group 2.    38-45   8280955
Group 3.    52-59   8336297
Group 4.    103-109 494767

对于输入字符串:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates. Target definition = 494767.

输入字符串的前 3 个匹配项:

Reassigning definition: 494801 from: [8280955] to: [8336297], advancing due dates.

具有 JavaScript、Python、PHP 和 GoLang 风格(参见 https://regex101.com/r/Br66wm/3),但不具有 SQL regexp-substr:

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\].+?\.(?: Target definition = (\d+))?.*$' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 4) as group_4
from
  input_string i, pattern_string p;

第 4 组总是 null。我对非捕获组的使用有什么问题?基本上,以下句子在我的输入测试字符串中是可选的:

 Target definition = 494767.

评论有点多,所以我会写在这里。如果觉得不合理我就删了

如果您一直在寻找这些字符串中的 数字 (与其周围的内容无关),那么它可以简化为

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select regexp_substr(test_string, '\d+', 1, 1) grp1,
  9         regexp_substr(test_string, '\d+', 1, 2) grp2,
 10         regexp_substr(test_string, '\d+', 1, 3) grp3,
 11         regexp_substr(test_string, '\d+', 1, 4) grp4
 12  from input_string;

GRP1       GRP2       GRP3       GRP4
---------- ---------- ---------- ----------
494801     8280955    8336297    494767
494801     8280955    8336297

SQL>

或者,没有固定数量 groups 的选项(尽管布局与您想要的不同):

SQL> with
  2    input_string as
  3    (
  4      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
  5      union all
  6      select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates.' as test_string from dual
  7     )
  8  select column_value grp_rn,
  9         regexp_substr(test_string, '\d+', 1, column_value) grp
 10  from input_String cross join
 11    table(cast(multiset(select level from dual
 12                        connect by level <= regexp_count(test_string, '\d+')
 13                       ) as sys.odcinumberlist));

 GRP_RN GRP
------- ----------
      1 494801
      2 8280955
      3 8336297
      4 494767
      1 494801
      2 8280955
      3 8336297

7 rows selected.

因为基于 POSIX 的正则表达式实现似乎不支持非捕获组,并且 regex_substr 的捕获组不容易作为单独的列使用,所以我进行了以下操作,它基本上为可选组使用不同的正则表达式。

with
  input_string as
  (
    select 'Reassigning definition: 494801 from: [8280955] to: [8336297], advancing dates. Target definition = 494767.' as test_string from dual
    union all
    select 'Reassigning definition: 494767 from: [8336297] to: [8369944], advancing dates.' as test_string from dual
   ),
   pattern_string as
   (
     select 'Reassigning definition: (\d+) from: \[(\d+)\] to: \[(\d+)\]' as pattern_string from dual
   )
select
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 1) as group_1,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 2) as group_2,
  regexp_substr(i.test_string, p.pattern_string, 1, 1, null, 3) as group_3,
  regexp_substr(i.test_string, 'Target definition = (\d+)', 1, 1, null, 1) as group_4
from
  input_string i, pattern_string p;