PL/SQL 按模式拆分字符串

Question

类似这个问题...

...我正在尝试拆分以下字符串：

Spent 30 CAD in movie tickets at Cineplex on 2018-06-01

我想要的输出是这样的：

ELEMENT ELEMENT_VALUE
------- -------------
      1 Spent
      2 30
      3 CAD
      4 movie tickets
      5 Cineplex
      6 2018-06-01

同理，应该可以处理：

Paid 600 EUR to Electric Company

制作中：

ELEMENT ELEMENT_VALUE
------- -------------
      1 Paid
      2 600
      3 EUR
      4 
      5 Electric Company

我试过这个正则表达式没有用：

(\w+)(\D+)(\w+)(?(?=in)(\w+)(at)(\w+)(on)(.?$)|((?=to)(\w+)(.?$)))

我查看了几个正则表达式网站加上这个 post 运气不佳：

Extract some part of text separated by delimiter using regex

有人可以帮忙吗？

Answer 1

这是一个简单的 SQL 分词器，它在 space 上中断：

select regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01','[^ ]+', 1, level) from dual
connect by regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01', '[^ ]+', 1, level) is not null

发件人：https://blogs.oracle.com/aramamoo/how-to-split-comma-separated-string-and-pass-to-in-clause-of-select-statement

Answer 2

您要求的输出有两个问题。首先是如何定义要排除的标记（'on'、'at' 等）。第二个是如何忽略某些标记中的space（'Electric Company'、'movie tickets'）。

通过两步过程即可轻松解决第一个问题。步骤 #1 在 space 上拆分字符串，步骤 #2 删除不需要的标记：

with exclude as (
  select 'in' as tkn from dual union all
  select 'at' as tkn from dual union all
  select 'to' as tkn from dual union all
  select 'on' as tkn from dual 
  )
  , str as (
    select id
           , level as element_order
           , regexp_substr(txt, '[^ ]+', 1, level) as tkn
    from t23
    where id = 10
    CONNECT BY level <= regexp_count(txt, '[^ ]+')+1
    and id = prior id
    and prior sys_guid() is not null
    )
 select row_number() over (partition by str.id order by str.element_order) as element
       , str.tkn as element_value
 from str
      left join exclude on exclude.tkn = str.tkn
 where exclude.tkn is null
 and str.tkn is not null
 ;

这里是a SQL Fiddle demo。

第二点比较难解决。我猜你需要再查找 table 来识别振铃器，并且可能使用 listagg() 来连接它们。

PL/SQL 按模式拆分字符串

PL/SQL Split string by pattern

regex

oracle

plsql

regexp-substr