删除正则表达式匹配后得到剩下的
Getting what's left after removing regex match
上下文是 SQL AS/400 (IBM i)
我的目标是最终得到两个值:一个由我已有的正则表达式确定的字符串,然后是源字符串中的所有其他内容,其中正则表达式的结果被删除并且间隙(如果有的话)被关闭。
这是SQL:
select HAD1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'**eventual_regex_goes_here**')
from ECH
where regexp_like(HAD1,'\bGATE')
期望的结果:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
D2 COMPOUND, GATE 11 GATE 11 D2 COMPOUND,
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
BRIERLY RD, GATE A, RIVER SIDE GATE A BRIERLY RD, , RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 , 37 KENEPURU DRIVE
如果第二个表达式也可以去掉逗号,那就太好了,但这不是必需的。剩余的字符串将通过其他 (non-regex) 处理以删除无关元素(phone 数字、注释、标点符号等)
看板软件建议的最接近的帖子是 ,它给出了以下字符串:
^.+?(?=\d{2})|(?<=\d{2}).+$
所以,首先我尝试用我的整个表达式代替两次出现的 \d{2}
并发现这(毫不奇怪)不会处理。然后我回到更基本的测试并尝试从那里开始。
让我们尝试将 GATE 这个词作为常量,加上几个边界(因为在内心深处我仍然只是一个 child,你知道他们怎么说:“Children 需要边界”)。
select had1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'^.+?(?=\bGATE\b)|(?<=\bGATE\b).+$')
from ech
where regexp_like(HAD1,'\bGATE')
结果:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
GATE 3, CNR QUARRY ROAD GATE 3 3, CNR QUARRY ROAD
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
GATE 6, HELLABYS ROAD GATE 6 6, HELLABYS ROAD
GATE 3, 548 PAKAKARIKI HILL GATE 3 3, 548 PAKAKARIKI HILL
GATE 5 - FLIGHTYS COMPOUND GATE 5 5 - FLIGHTYS COMPOUND
GATE 3 - 548 PAEKAKARIKI HILL ROAD GATE 3 3 - 548 PAEKAKARIKI HILL ROAD
GATE 14 - TAKAPU COMPOUND GATE 14 14 - TAKAPU COMPOUND
35 GATEWAY DRIVE - -
GATE 6 GATE 6 6
TAKAPU ROAD,GATE 20,SH1 GATE 20 TAKAPU ROAD,
这看起来很有希望,请记住我没有对第二个结果列使用完整表达式。但是已经有一点不对了。
第二行和最后一行应该有更多数据,分别是“2”和“,SH1”。字符串“35 GATEWAY DRIVE”应该在最后一列。我想要 一切 除了表达式找到的内容(记住,此刻只是整个单词 GATE)。
似乎可以 return 删除文本的一侧或另一侧的剩余文本,但不能同时从两侧删除,如果没有发现要删除的内容,则不能删除所有剩余文本。因此,在我理解为什么我没有得到所有不是 GATE 的文本之前,我没有必要继续添加更复杂的内容以包括门号。因此,我会在这里暂停并寻求帮助。
你可以试试这个:
with data (s) as (values
('D2 COMPOUND, GATE 11'),
('2/22 GATEWAY DRIVE'),
('ASHBURTON FITTINGS GATE 2'),
('BRIERLY RD, GATE A, RIVER SIDE'),
('GATE 16, 37 KENEPURU DRIVE')
)
select s,
regexp_substr(s,' ?(GATE|LEVEL|DOOR|UNITS) '),
replace(regexp_replace(s,' ?(GATE|LEVEL|DOOR|UNITS) ',''),',',' ')
from data
结果:
D2 COMPOUND, GATE 11 GATE D2 COMPOUND 11
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE ASHBURTON FITTINGS 2
BRIERLY RD, GATE A, RIVER SIDE GATE BRIERLY RD A RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 37 KENEPURU DRIVE
我已经选择了用户2398621友情提供的正确答案。
但是,对于那些在家玩的人来说,这里是 full-fat nearly-application-ready 答案在将要应用的数据的上下文中。注释被括起来像 /* this */
select distinct HAD1 ,
regexp_substr(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
trim( /* remove leading/trailing blanks from REPLACE func */
replace( /* replace commas */
replace( /* replace slashes */
replace( /* replace dashes */
regexp_replace(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}',
'') /* replace extra address detail with null */
,'-',' ')
,'/',' ')
,',',' ')
)
from ECH
where regexp_like(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\b')
and length(trim(HAD1 )) > 12 /* show only longish addresses in sample */
示例 GATE 条目
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
GATE 6 52 MAHIA ROAD GATE 6 52 MAHIA ROAD
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
FIRST GATE AFTER THE ROUNDABOUT GATE AFTER FIRST THE ROUNDABOUT
GATE 2, 61-63 NORMANBY ROAD GATE 2 61 63 NORMANBY ROAD
GATE 7, OFF MORRING STREET GATE 7 OFF MORRING STREET
GATE 7 OFF MORRIN STREET GATE 7 OFF MORRIN STREET
GATE 6 SUBSTATION ROAD GATE 6 SUBSTATION ROAD
VIA GATE 4, BUILDING 108 GATE 4 VIA BUILDING 108
LEVEL 条目示例(请注意第一行空白 REGEXP_REPLACE 是正确的,因为 LEVEL 和 UNIT(及其编号)都已被删除)
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
LEVEL 2 UNIT 16 LEVEL 2
TRANSPOWER HOUSE - LEVEL 8 LEVEL 8 TRANSPOWER HOUSE
LEVEL 3/27 NAPIER STREET LEVEL 3 27 NAPIER STREET
LEVEL 2 GRAHAM STREET SERVICE CENTRE LEVEL 2 GRAHAM STREET SERVICE CENTRE
LEVEL 1 - MATT WILES LEVEL 1 MATT WILES
ANZ CENTRE, LEVEL 2 LEVEL 2 ANZ CENTRE
DOOR 条目示例
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
NEXT DOOR TO 201 DOOR TO NEXT 201
WAREHOUSE DOOR A DOOR A WAREHOUSE
DOOR 11 ( WAREHOUSE) DOOR 11 ( WAREHOUSE)
DOOR 11 (WAREHOUSE) DOOR 11 (WAREHOUSE)
示例 UNIT 条目
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
UNIT B 11 LANGSTONE LANE UNIT B 11 LANGSTONE LANE
26 BELFAST ROAD UNIT 1 UNIT 1 26 BELFAST ROAD
UNIT C 589 TERMAINE AVE UNIT C 589 TERMAINE AVE
UNIT 1, 3 HENRY ROSE PLACE UNIT 1 3 HENRY ROSE PLACE
UNIT 1/12 ANVIL ROAD UNIT 1 12 ANVIL ROAD
UNIT D1, 269A MT SMART ROAD UNIT D1 269A MT SMART ROAD
您会注意到仍然存在一些异常情况,即使是在这个小样本中 - 例如有时删除表达式选择的文本会留下无意义的剩余部分,有时我们需要删除的破折号,等等,但是我将手动修改 5% 而不是手动修改 95% 的需要注意的案例。
我知道您已经将答案标记为正确,但这里没有所有这些 replace
。不同之处在于我在初始 REGEX 的两边选择了空格和逗号以替换为单个空格,然后如果该空格引导或尾随字符串,我 trim 将其关闭,如下所示:
CREATE TABLE strtest
(string varchar(255));
INSERT INTO strtest
VALUES ('D2 COMPOUND, GATE 11'),
('2/22 GATEWAY DRIVE'),
('ASHBURN FITTINGS GATE 2'),
('BRIERLY RD, GATE A, RIVER SIDE'),
('GATE 16, 37 KENEPURU DRIVE')
select STRING,
regexp_substr(STRING,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
TRIM(regexp_REPLACE(STRING,'[ ,/-]*\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}[ ,/-]*', ' '))
from STRTEST
where regexp_like(STRING,'\bGATE')
|STRING |REGEXP_SUBSTR |REGEX_REPLACE |
|---------------------------------------|---------------|------------------------|
|D2 COMPOUND, GATE 11 |GATE 11 |D2 COMPOUND |
|2/22 GATEWAY DRIVE | |2/22 GATEWAY DRIVE |
|ASHBURN FITTINGS GATE 2 |GATE 2 |ASHBURN FITTINGS |
|BRIERLY RD, GATE A, RIVER SIDE |GATE A |BRIERLY RD RIVER SIDE |
|GATE 16, 37 KENEPURU DRIVE |GATE 16 |37 KENEPURU DRIVE |
|LEVEL 3/27 NAPIER STREET |LEVEL 3 |27 NAPIER STREET |
|LEVEL 1 - MATT WILES |LEVEL 1 |MATT WILES |
神奇之处在于我添加到 REGEXP 开头和结尾的 [ ,]*
表达式。如果你也想获得破折号和斜杠,只需将 [ -,/]*
.
您仍然有那些麻烦的 DOOR TO
和 GATE AFTER
条目,但它们很少,您以后可能会更正它们。
上下文是 SQL AS/400 (IBM i)
我的目标是最终得到两个值:一个由我已有的正则表达式确定的字符串,然后是源字符串中的所有其他内容,其中正则表达式的结果被删除并且间隙(如果有的话)被关闭。
这是SQL:
select HAD1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'**eventual_regex_goes_here**')
from ECH
where regexp_like(HAD1,'\bGATE')
期望的结果:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
D2 COMPOUND, GATE 11 GATE 11 D2 COMPOUND,
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
BRIERLY RD, GATE A, RIVER SIDE GATE A BRIERLY RD, , RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 , 37 KENEPURU DRIVE
如果第二个表达式也可以去掉逗号,那就太好了,但这不是必需的。剩余的字符串将通过其他 (non-regex) 处理以删除无关元素(phone 数字、注释、标点符号等)
看板软件建议的最接近的帖子是
^.+?(?=\d{2})|(?<=\d{2}).+$
所以,首先我尝试用我的整个表达式代替两次出现的 \d{2}
并发现这(毫不奇怪)不会处理。然后我回到更基本的测试并尝试从那里开始。
让我们尝试将 GATE 这个词作为常量,加上几个边界(因为在内心深处我仍然只是一个 child,你知道他们怎么说:“Children 需要边界”)。
select had1,
regexp_substr(HAD1,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
regexp_substr(HAD1,'^.+?(?=\bGATE\b)|(?<=\bGATE\b).+$')
from ech
where regexp_like(HAD1,'\bGATE')
结果:
Ship To REGEXP_SUBSTR REGEXP_SUBSTR
Address
GATE 3, CNR QUARRY ROAD GATE 3 3, CNR QUARRY ROAD
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
GATE 6, HELLABYS ROAD GATE 6 6, HELLABYS ROAD
GATE 3, 548 PAKAKARIKI HILL GATE 3 3, 548 PAKAKARIKI HILL
GATE 5 - FLIGHTYS COMPOUND GATE 5 5 - FLIGHTYS COMPOUND
GATE 3 - 548 PAEKAKARIKI HILL ROAD GATE 3 3 - 548 PAEKAKARIKI HILL ROAD
GATE 14 - TAKAPU COMPOUND GATE 14 14 - TAKAPU COMPOUND
35 GATEWAY DRIVE - -
GATE 6 GATE 6 6
TAKAPU ROAD,GATE 20,SH1 GATE 20 TAKAPU ROAD,
这看起来很有希望,请记住我没有对第二个结果列使用完整表达式。但是已经有一点不对了。
第二行和最后一行应该有更多数据,分别是“2”和“,SH1”。字符串“35 GATEWAY DRIVE”应该在最后一列。我想要 一切 除了表达式找到的内容(记住,此刻只是整个单词 GATE)。
似乎可以 return 删除文本的一侧或另一侧的剩余文本,但不能同时从两侧删除,如果没有发现要删除的内容,则不能删除所有剩余文本。因此,在我理解为什么我没有得到所有不是 GATE 的文本之前,我没有必要继续添加更复杂的内容以包括门号。因此,我会在这里暂停并寻求帮助。
你可以试试这个:
with data (s) as (values
('D2 COMPOUND, GATE 11'),
('2/22 GATEWAY DRIVE'),
('ASHBURTON FITTINGS GATE 2'),
('BRIERLY RD, GATE A, RIVER SIDE'),
('GATE 16, 37 KENEPURU DRIVE')
)
select s,
regexp_substr(s,' ?(GATE|LEVEL|DOOR|UNITS) '),
replace(regexp_replace(s,' ?(GATE|LEVEL|DOOR|UNITS) ',''),',',' ')
from data
结果:
D2 COMPOUND, GATE 11 GATE D2 COMPOUND 11
2/22 GATEWAY DRIVE - 2/22 GATEWAY DRIVE
ASHBURTON FITTINGS GATE 2 GATE ASHBURTON FITTINGS 2
BRIERLY RD, GATE A, RIVER SIDE GATE BRIERLY RD A RIVER SIDE
GATE 16, 37 KENEPURU DRIVE GATE 16 37 KENEPURU DRIVE
我已经选择了用户2398621友情提供的正确答案。
但是,对于那些在家玩的人来说,这里是 full-fat nearly-application-ready 答案在将要应用的数据的上下文中。注释被括起来像 /* this */
select distinct HAD1 ,
regexp_substr(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
trim( /* remove leading/trailing blanks from REPLACE func */
replace( /* replace commas */
replace( /* replace slashes */
replace( /* replace dashes */
regexp_replace(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}',
'') /* replace extra address detail with null */
,'-',' ')
,'/',' ')
,',',' ')
)
from ECH
where regexp_like(HAD1 ,'\b(GATE|LEVEL|DOOR|UNITS?)\b')
and length(trim(HAD1 )) > 12 /* show only longish addresses in sample */
示例 GATE 条目
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
GATE 6 52 MAHIA ROAD GATE 6 52 MAHIA ROAD
ASHBURTON FITTINGS GATE 2 GATE 2 ASHBURTON FITTINGS
FIRST GATE AFTER THE ROUNDABOUT GATE AFTER FIRST THE ROUNDABOUT
GATE 2, 61-63 NORMANBY ROAD GATE 2 61 63 NORMANBY ROAD
GATE 7, OFF MORRING STREET GATE 7 OFF MORRING STREET
GATE 7 OFF MORRIN STREET GATE 7 OFF MORRIN STREET
GATE 6 SUBSTATION ROAD GATE 6 SUBSTATION ROAD
VIA GATE 4, BUILDING 108 GATE 4 VIA BUILDING 108
LEVEL 条目示例(请注意第一行空白 REGEXP_REPLACE 是正确的,因为 LEVEL 和 UNIT(及其编号)都已被删除)
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
LEVEL 2 UNIT 16 LEVEL 2
TRANSPOWER HOUSE - LEVEL 8 LEVEL 8 TRANSPOWER HOUSE
LEVEL 3/27 NAPIER STREET LEVEL 3 27 NAPIER STREET
LEVEL 2 GRAHAM STREET SERVICE CENTRE LEVEL 2 GRAHAM STREET SERVICE CENTRE
LEVEL 1 - MATT WILES LEVEL 1 MATT WILES
ANZ CENTRE, LEVEL 2 LEVEL 2 ANZ CENTRE
DOOR 条目示例
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
NEXT DOOR TO 201 DOOR TO NEXT 201
WAREHOUSE DOOR A DOOR A WAREHOUSE
DOOR 11 ( WAREHOUSE) DOOR 11 ( WAREHOUSE)
DOOR 11 (WAREHOUSE) DOOR 11 (WAREHOUSE)
示例 UNIT 条目
Ship To REGEXP_SUBSTR REGEXP_REPLACE
Address
UNIT B 11 LANGSTONE LANE UNIT B 11 LANGSTONE LANE
26 BELFAST ROAD UNIT 1 UNIT 1 26 BELFAST ROAD
UNIT C 589 TERMAINE AVE UNIT C 589 TERMAINE AVE
UNIT 1, 3 HENRY ROSE PLACE UNIT 1 3 HENRY ROSE PLACE
UNIT 1/12 ANVIL ROAD UNIT 1 12 ANVIL ROAD
UNIT D1, 269A MT SMART ROAD UNIT D1 269A MT SMART ROAD
您会注意到仍然存在一些异常情况,即使是在这个小样本中 - 例如有时删除表达式选择的文本会留下无意义的剩余部分,有时我们需要删除的破折号,等等,但是我将手动修改 5% 而不是手动修改 95% 的需要注意的案例。
我知道您已经将答案标记为正确,但这里没有所有这些 replace
。不同之处在于我在初始 REGEX 的两边选择了空格和逗号以替换为单个空格,然后如果该空格引导或尾随字符串,我 trim 将其关闭,如下所示:
CREATE TABLE strtest
(string varchar(255));
INSERT INTO strtest
VALUES ('D2 COMPOUND, GATE 11'),
('2/22 GATEWAY DRIVE'),
('ASHBURN FITTINGS GATE 2'),
('BRIERLY RD, GATE A, RIVER SIDE'),
('GATE 16, 37 KENEPURU DRIVE')
select STRING,
regexp_substr(STRING,'\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}'),
TRIM(regexp_REPLACE(STRING,'[ ,/-]*\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}[ ,/-]*', ' '))
from STRTEST
where regexp_like(STRING,'\bGATE')
|STRING |REGEXP_SUBSTR |REGEX_REPLACE | |---------------------------------------|---------------|------------------------| |D2 COMPOUND, GATE 11 |GATE 11 |D2 COMPOUND | |2/22 GATEWAY DRIVE | |2/22 GATEWAY DRIVE | |ASHBURN FITTINGS GATE 2 |GATE 2 |ASHBURN FITTINGS | |BRIERLY RD, GATE A, RIVER SIDE |GATE A |BRIERLY RD RIVER SIDE | |GATE 16, 37 KENEPURU DRIVE |GATE 16 |37 KENEPURU DRIVE | |LEVEL 3/27 NAPIER STREET |LEVEL 3 |27 NAPIER STREET | |LEVEL 1 - MATT WILES |LEVEL 1 |MATT WILES |
神奇之处在于我添加到 REGEXP 开头和结尾的 [ ,]*
表达式。如果你也想获得破折号和斜杠,只需将 [ -,/]*
.
您仍然有那些麻烦的 DOOR TO
和 GATE AFTER
条目,但它们很少,您以后可能会更正它们。