在字段中使用空格的制表符分隔记录上使用 regexp_substr

Using regexp_substr on tab delimited record with spaces in fields

使用 Oracle 12c,如何使用 regexp_substr 分隔字段可能包含空格的制表符分隔记录?记录有四个字段。第三个字段包含带空格的单词。

我用这个作为参考:Oracle Regex

这是我的查询

with tab_delimited_record as
 (select 'Field1    Field2  This is field3 and contains spaces  Field4' as record_with_fields from dual) 
 select record_with_fields,
        regexp_substr('\S+',1,3) as field3a, -- Expect ==>This is field3...
        regexp_substr('\t+',1,3) as field3b, -- Expect==>This is field3...
        regexp_substr('[[::space::]]+',1,3) as field_3c -- Another version
  from  tab_delimited_record

想要的结果

RECORD_WITH_FIELDS

Field1 Field2 这是 field3 并且包含空格 Field4

FIELD3

这是字段 3,包含空格

我相信您正在寻找这样的东西。请注意此示例 returns 所有字段,但当然,如果您只需要 select 字段 3 即可。 CTE 使用制表符分隔的字段构建字符串。然后,查询使用 regex_substr 获取第 n 个(第 4 个参数)字符串,后跟 TAB 或行尾。

with tab_delimited_record(record_with_fields) as (
  select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' from dual
) 
select record_with_fields,
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 1, null, 1) as field_1, 
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 2, null, 1) as field_2, 
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 3, null, 1) as field_3,
       regexp_substr(record_with_fields, '(.*?)('||chr(09)||'|$)', 1, 4, null, 1) as field_4
from  tab_delimited_record;

在使用 Oracle SQL 时,您不能按字面意思插入 '\t'。您需要断开字符串,使用 chr(09)(ascii 选项卡)然后构造字符串。试试这个

with tab_delimited_record as
 (select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' as record_with_fields from dual) 
    select record_with_fields,
        regexp_substr(record_with_fields,'(\S+)\s+(\S+)\s+(.+)\s+',1,1,'',3) as field3a, -- Expect ==>This is field3...
        regexp_substr(record_with_fields,'(\S+)'||chr(09)||'(\S+)'||chr(09)||'(.+)\s+',1,1,'',3) as field3b, -- Expect==>This is field3...
        regexp_substr(record_with_fields,'(\S+)[[:space:]]+(\S+)[[:space:]]+(.+)[[:space:]]+',1,1,'',3) as field_3c -- Another version
  from  tab_delimited_record

正则表达式的另一个版本:

with tab_delimited_record(record_with_fields) as (
  select 'Field1'||chr(09)||'Field2'||chr(09)||'This is field3 and contains spaces'||chr(09)||'Field4' from dual
) 
select record_with_fields,
       regexp_substr(record_with_fields, '[^'||chr(09)||']+', 1, 3) as field_3
from  tab_delimited_record;