如何使用正则表达式识别特定模式之间的单词:Oracle?
How to identify the words in between a particular pattern using regexp: Oracle?
我有一个文本字段。我需要识别模式 <a href
和 a>
.
之间的单词
这个图案可以在正文的beginning/end/mid处。
with t as (
select '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' as text from dual
union select '<a href www.tech.technical Network a>' as text from dual union
select 'www.tech.tech///technical <a href Network Group a>' as text from dual)
select * from t
WHERE REGEXP_LIKE(text,'(^|\W)<a href\S*','i')
这给了我前两行正确的结果。但是我需要检查 'group' 这个词(不区分大小写)。我们如何检查单词 'group' 以及该单词应该在模式中。在这种情况下,应该返回第 1 行和第 3 行。
搜索完整的模式,然后在该模式的子字符串中搜索单词 Group
。如果文本中有多个匹配项,那么您可以使用递归子查询分解子句来找到它们:
Oracle 设置:
CREATE TABLE table_name ( id, text ) AS
select 1, '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' from dual union all
select 2, '<a href www.tech.technical Network a>' from dual union all
select 3, 'www.tech.tech///technical <a href Network Group a>' from dual union all
select 4, '<a hrefgroup a>' FROM DUAL UNION ALL
select 5, '<a href groupa>' FROM DUAL UNION ALL
select 6, '<a href workgroup a>' FROM DUAL UNION ALL
select 7, '<a href test1 a> Group <a href test2 a>' FROM DUAL;
查询:
WITH positions ( id, text, match, position ) AS (
SELECT id,
text,
REGEXP_SUBSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
1,
1,
'i'
),
REGEXP_INSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
1,
1,
0,
'i'
)
FROM table_name
UNION ALL
SELECT id,
text,
REGEXP_SUBSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
position + 1,
1,
'i'
),
REGEXP_INSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
position + 1,
1,
0,
'i'
)
FROM positions
WHERE position > 0
)
SELECT id,
text
FROM positions
WHERE REGEXP_LIKE( match, '\sGroup\s', 'i' );
输出:
ID | TEXT
-: | :--------------------------------------------------------------------
1 | <a href Part of the technical Network Group www.tech.com/sites/ hh a>
3 | www.tech.tech///technical <a href Network Group a>
db<>fiddle here
您可以像这样扩展您的正则表达式:<a href.*group.*a>
。
with t as (
select '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' as text from dual
union all select '<a href www.tech.technical Network a>' as text from dual
union all select 'www.tech.tech///technical <a href Network Group a>' as text from dual)
select * from t
WHERE REGEXP_LIKE(text,'<a href.*group.*a>','i')
| TEXT |
| :-------------------------------------------------------------------- |
| <a href Part of the technical Network Group www.tech.com/sites/ hh a> |
| www.tech.tech///technical <a href Network Group a> |
注意:只要您的文本仅包含一个 <a href ... a>
模式,这在您的示例数据中就是这种情况。
您可以改进正则表达式以确保它只匹配单词 'group'
(而不匹配包含 'group'
的其他单词,如 'workgroup'
或 'grouped'
) :
<a href.*\sgroup\s.*a>
只要 <a href
后面总是跟着 space 并且 a>
前面总是 space。
我有一个文本字段。我需要识别模式 <a href
和 a>
.
这个图案可以在正文的beginning/end/mid处。
with t as (
select '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' as text from dual
union select '<a href www.tech.technical Network a>' as text from dual union
select 'www.tech.tech///technical <a href Network Group a>' as text from dual)
select * from t
WHERE REGEXP_LIKE(text,'(^|\W)<a href\S*','i')
这给了我前两行正确的结果。但是我需要检查 'group' 这个词(不区分大小写)。我们如何检查单词 'group' 以及该单词应该在模式中。在这种情况下,应该返回第 1 行和第 3 行。
搜索完整的模式,然后在该模式的子字符串中搜索单词 Group
。如果文本中有多个匹配项,那么您可以使用递归子查询分解子句来找到它们:
Oracle 设置:
CREATE TABLE table_name ( id, text ) AS
select 1, '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' from dual union all
select 2, '<a href www.tech.technical Network a>' from dual union all
select 3, 'www.tech.tech///technical <a href Network Group a>' from dual union all
select 4, '<a hrefgroup a>' FROM DUAL UNION ALL
select 5, '<a href groupa>' FROM DUAL UNION ALL
select 6, '<a href workgroup a>' FROM DUAL UNION ALL
select 7, '<a href test1 a> Group <a href test2 a>' FROM DUAL;
查询:
WITH positions ( id, text, match, position ) AS (
SELECT id,
text,
REGEXP_SUBSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
1,
1,
'i'
),
REGEXP_INSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
1,
1,
0,
'i'
)
FROM table_name
UNION ALL
SELECT id,
text,
REGEXP_SUBSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
position + 1,
1,
'i'
),
REGEXP_INSTR(
text,
'(^|\W)<a href\s+.*?\s+a>(\W|$)',
position + 1,
1,
0,
'i'
)
FROM positions
WHERE position > 0
)
SELECT id,
text
FROM positions
WHERE REGEXP_LIKE( match, '\sGroup\s', 'i' );
输出:
ID | TEXT -: | :-------------------------------------------------------------------- 1 | <a href Part of the technical Network Group www.tech.com/sites/ hh a> 3 | www.tech.tech///technical <a href Network Group a>
db<>fiddle here
您可以像这样扩展您的正则表达式:<a href.*group.*a>
。
with t as (
select '<a href Part of the technical Network Group www.tech.com/sites/ hh a>' as text from dual
union all select '<a href www.tech.technical Network a>' as text from dual
union all select 'www.tech.tech///technical <a href Network Group a>' as text from dual)
select * from t
WHERE REGEXP_LIKE(text,'<a href.*group.*a>','i')
| TEXT | | :-------------------------------------------------------------------- | | <a href Part of the technical Network Group www.tech.com/sites/ hh a> | | www.tech.tech///technical <a href Network Group a> |
注意:只要您的文本仅包含一个 <a href ... a>
模式,这在您的示例数据中就是这种情况。
您可以改进正则表达式以确保它只匹配单词 'group'
(而不匹配包含 'group'
的其他单词,如 'workgroup'
或 'grouped'
) :
<a href.*\sgroup\s.*a>
只要 <a href
后面总是跟着 space 并且 a>
前面总是 space。