拆分多行 CLOB 列 - Oracle PL/SQL
Split Multiline CLOB Column - Oracle PL/SQL
我有一个 table,其中包含一个 CLOB 字段,其值由逗号分隔值的行组成。在输出中,我希望每一行都以特定值开头。我还想高效地提取一些逗号分隔值。
输入table(3行):
id my_clob
001 500,aaa,bbb
500,ccc,ddd
480,1,2,bad
500,eee,fff
002 777,0,0,bad
003 500,yyy,zzz
目标输出(4 行):
id my_clob line_num line second_val
001 500,aaa,bbb 1 500,aaa,bbb aaa
500,ccc,ddd
480,1,2,bad
500,eee,fff
001 500,aaa,bbb 2 500,ccc,ddd ccc
500,ccc,ddd
480,1,2,bad
500,eee,fff
001 500,aaa,bbb 3 500,eee,fff eee
500,ccc,ddd
480,1,2,bad
500,eee,fff
003 500,yyy,zzz 1 500,yyy,zzz yyy
这与 , but in that case, there was a non-hidden character to split on. A related question 删除换行符非常相似。我想根据分隔这些行的任何字符进行拆分。我试过 chr(10) || chr(13)
和 [:space:]+
的变体但没有成功
我的尝试:
SELECT
id
,my_clob
,level as line_num
,regexp_substr(my_clob,'^500,\S+', 1, level, 'm') as line
,regexp_substr(
regexp_substr(
my_clob,'^500,\S+', 1, level, 'm'
)
,'[^,]+', 1, 2
) as second_val
FROM tbl
CONNECT BY level <= regexp_count(my_clob, '^500,\S+', 1, 'm')
and prior id = id
and prior sys_guid() is not null
结果通常只从my_clob中的第一行导出,这取决于我如何调整匹配模式。
我指定这个'm'
match_parameter,根据Oracle Docs:
'm' treats the source string as multiple lines. Oracle interprets the caret (^) and dollar sign ($) as the start and end, respectively, of any line anywhere in the source string
这些 regexp_*(my_clob, '^500,\S+', 1, 'm')
有什么问题吗?或者更好的是,有没有没有正则表达式的更高效的方法?
您可以按如下方式使用REGEXP
:
SQL> -- sample data
SQL> with your_data(id,myclob) as
2 (select 1, '500,aaa,bbb
3 500,ccc,ddd
4 480,1,2,bad
5 500,eee,fff' from dual)
6 -- Your query starts from here
7 select id, myclob, line_num, lines as line,
8 regexp_substr(lines,'[^,]+',1,2) as second_val
9 from
10 (select id, myclob, column_value as line_num,
11 trim(regexp_substr(d.myclob,'.+',1,column_value,'m')) as lines
12 from your_data d
13 cross join table(cast(multiset(select level from dual
14 connect by level <= regexp_count(d.myclob,'$',1,'m'))
15 as sys.OdciNumberList)) levels)
16 where regexp_like(lines,'^[500]');
ID MYCLOB LINE_NUM LINE SECOND_VAL
--- ------------------------- -------- --------------- ----------
1 500,aaa,bbb 1 500,aaa,bbb aaa
500,ccc,ddd
480,1,2,bad
500,eee,fff
1 500,aaa,bbb 2 500,ccc,ddd ccc
500,ccc,ddd
480,1,2,bad
500,eee,fff
1 500,aaa,bbb 4 500,eee,fff eee
500,ccc,ddd
480,1,2,bad
500,eee,fff
SQL>
我有一个 table,其中包含一个 CLOB 字段,其值由逗号分隔值的行组成。在输出中,我希望每一行都以特定值开头。我还想高效地提取一些逗号分隔值。
输入table(3行):
id my_clob
001 500,aaa,bbb
500,ccc,ddd
480,1,2,bad
500,eee,fff
002 777,0,0,bad
003 500,yyy,zzz
目标输出(4 行):
id my_clob line_num line second_val
001 500,aaa,bbb 1 500,aaa,bbb aaa
500,ccc,ddd
480,1,2,bad
500,eee,fff
001 500,aaa,bbb 2 500,ccc,ddd ccc
500,ccc,ddd
480,1,2,bad
500,eee,fff
001 500,aaa,bbb 3 500,eee,fff eee
500,ccc,ddd
480,1,2,bad
500,eee,fff
003 500,yyy,zzz 1 500,yyy,zzz yyy
这与 chr(10) || chr(13)
和 [:space:]+
的变体但没有成功
我的尝试:
SELECT
id
,my_clob
,level as line_num
,regexp_substr(my_clob,'^500,\S+', 1, level, 'm') as line
,regexp_substr(
regexp_substr(
my_clob,'^500,\S+', 1, level, 'm'
)
,'[^,]+', 1, 2
) as second_val
FROM tbl
CONNECT BY level <= regexp_count(my_clob, '^500,\S+', 1, 'm')
and prior id = id
and prior sys_guid() is not null
结果通常只从my_clob中的第一行导出,这取决于我如何调整匹配模式。
我指定这个'm'
match_parameter,根据Oracle Docs:
'm' treats the source string as multiple lines. Oracle interprets the caret (^) and dollar sign ($) as the start and end, respectively, of any line anywhere in the source string
这些 regexp_*(my_clob, '^500,\S+', 1, 'm')
有什么问题吗?或者更好的是,有没有没有正则表达式的更高效的方法?
您可以按如下方式使用REGEXP
:
SQL> -- sample data
SQL> with your_data(id,myclob) as
2 (select 1, '500,aaa,bbb
3 500,ccc,ddd
4 480,1,2,bad
5 500,eee,fff' from dual)
6 -- Your query starts from here
7 select id, myclob, line_num, lines as line,
8 regexp_substr(lines,'[^,]+',1,2) as second_val
9 from
10 (select id, myclob, column_value as line_num,
11 trim(regexp_substr(d.myclob,'.+',1,column_value,'m')) as lines
12 from your_data d
13 cross join table(cast(multiset(select level from dual
14 connect by level <= regexp_count(d.myclob,'$',1,'m'))
15 as sys.OdciNumberList)) levels)
16 where regexp_like(lines,'^[500]');
ID MYCLOB LINE_NUM LINE SECOND_VAL
--- ------------------------- -------- --------------- ----------
1 500,aaa,bbb 1 500,aaa,bbb aaa
500,ccc,ddd
480,1,2,bad
500,eee,fff
1 500,aaa,bbb 2 500,ccc,ddd ccc
500,ccc,ddd
480,1,2,bad
500,eee,fff
1 500,aaa,bbb 4 500,eee,fff eee
500,ccc,ddd
480,1,2,bad
500,eee,fff
SQL>