使用正则表达式从逗号分隔列表中删除重复项

Question

我有

contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2,
paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, 
clause 2

我想得到

contract, clause 1, Subsection 1.1, Subsection 1.2, paragraph (a), paragraph 
(b), clause 2

我发现正则表达式可以做到这一点，但我找不到要使用哪个字符串来做到这一点

请帮忙..

Answer 1

基于this link将逗号分隔值拆分为行，我将字符串拆分为行，保留第一次出现的位置，重新聚合值

with test_string as ( 
select 1 as id,
 'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val 
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by position) FROM (
select distinct id, first_value(position) over ( partition by word order by position ) position, word from (
select 
  distinct t.id,
  levels.column_value as position,
  trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value))  as word
from 
  test_string t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.val, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
  )
) GROUP BY id

如果您不想保留订单

with test_string as ( 
select 1 as id,
 'contract, clause 1, Subsection 1.1, contract, clause 1, Subsection 1.2, paragraph (a), contract, clause 1, Subsection 1.2, paragraph (b), contract, clause 2' val 
from dual)
select id, listagg(word,', ') WITHIN GROUP (order by 1) FROM (
select 
  distinct t.id,
  trim(regexp_substr(t.val, '[^,]+', 1, levels.column_value))  as word
from 
  test_string t,
  table(cast(multiset(select level from dual connect by  level <= length (regexp_replace(t.val, '[^,]+'))  + 1) as sys.OdciNumberList)) levels
) GROUP BY id

使用正则表达式从逗号分隔列表中删除重复项

Remove duplicates from comma separated list with regexp

oracle

plsql

regexp-replace