PL/SQL 按模式拆分字符串
PL/SQL Split string by pattern
类似这个问题...
...我正在尝试拆分以下字符串:
Spent 30 CAD in movie tickets at Cineplex on 2018-06-01
我想要的输出是这样的:
ELEMENT ELEMENT_VALUE
------- -------------
1 Spent
2 30
3 CAD
4 movie tickets
5 Cineplex
6 2018-06-01
同理,应该可以处理:
Paid 600 EUR to Electric Company
制作中:
ELEMENT ELEMENT_VALUE
------- -------------
1 Paid
2 600
3 EUR
4
5 Electric Company
我试过这个正则表达式没有用:
(\w+)(\D+)(\w+)(?(?=in)(\w+)(at)(\w+)(on)(.?$)|((?=to)(\w+)(.?$)))
我查看了几个正则表达式网站加上这个 post 运气不佳:
Extract some part of text separated by delimiter using regex
有人可以帮忙吗?
这是一个简单的 SQL 分词器,它在 space 上中断:
select regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01','[^ ]+', 1, level) from dual
connect by regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01', '[^ ]+', 1, level) is not null
您要求的输出有两个问题。首先是如何定义要排除的标记('on'、'at' 等)。第二个是如何忽略某些标记中的space('Electric Company'、'movie tickets')。
通过两步过程即可轻松解决第一个问题。步骤 #1 在 space 上拆分字符串,步骤 #2 删除不需要的标记:
with exclude as (
select 'in' as tkn from dual union all
select 'at' as tkn from dual union all
select 'to' as tkn from dual union all
select 'on' as tkn from dual
)
, str as (
select id
, level as element_order
, regexp_substr(txt, '[^ ]+', 1, level) as tkn
from t23
where id = 10
CONNECT BY level <= regexp_count(txt, '[^ ]+')+1
and id = prior id
and prior sys_guid() is not null
)
select row_number() over (partition by str.id order by str.element_order) as element
, str.tkn as element_value
from str
left join exclude on exclude.tkn = str.tkn
where exclude.tkn is null
and str.tkn is not null
;
第二点比较难解决。我猜你需要再查找 table 来识别振铃器,并且可能使用 listagg()
来连接它们。
类似这个问题...
...我正在尝试拆分以下字符串:
Spent 30 CAD in movie tickets at Cineplex on 2018-06-01
我想要的输出是这样的:
ELEMENT ELEMENT_VALUE
------- -------------
1 Spent
2 30
3 CAD
4 movie tickets
5 Cineplex
6 2018-06-01
同理,应该可以处理:
Paid 600 EUR to Electric Company
制作中:
ELEMENT ELEMENT_VALUE
------- -------------
1 Paid
2 600
3 EUR
4
5 Electric Company
我试过这个正则表达式没有用:
(\w+)(\D+)(\w+)(?(?=in)(\w+)(at)(\w+)(on)(.?$)|((?=to)(\w+)(.?$)))
我查看了几个正则表达式网站加上这个 post 运气不佳:
Extract some part of text separated by delimiter using regex
有人可以帮忙吗?
这是一个简单的 SQL 分词器,它在 space 上中断:
select regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01','[^ ]+', 1, level) from dual
connect by regexp_substr('Spent 30 CAD in movie tickets at Cineplex on 2018-06-01', '[^ ]+', 1, level) is not null
您要求的输出有两个问题。首先是如何定义要排除的标记('on'、'at' 等)。第二个是如何忽略某些标记中的space('Electric Company'、'movie tickets')。
通过两步过程即可轻松解决第一个问题。步骤 #1 在 space 上拆分字符串,步骤 #2 删除不需要的标记:
with exclude as (
select 'in' as tkn from dual union all
select 'at' as tkn from dual union all
select 'to' as tkn from dual union all
select 'on' as tkn from dual
)
, str as (
select id
, level as element_order
, regexp_substr(txt, '[^ ]+', 1, level) as tkn
from t23
where id = 10
CONNECT BY level <= regexp_count(txt, '[^ ]+')+1
and id = prior id
and prior sys_guid() is not null
)
select row_number() over (partition by str.id order by str.element_order) as element
, str.tkn as element_value
from str
left join exclude on exclude.tkn = str.tkn
where exclude.tkn is null
and str.tkn is not null
;
第二点比较难解决。我猜你需要再查找 table 来识别振铃器,并且可能使用 listagg()
来连接它们。