regexp_matches() 使用不同的正则表达式模式对同一文本进行多次捕获

Question

给定以下标题 Olympic National Park, WA. [OC][5239x3492] 目标是捕获标签 OC、5239x3492，并将 5239 和 3492 作为两个单独的捕获.想法是使用一系列积极的前瞻性 (?=) 到 non-consumingly 进行 N 匹配，例如 [\[\(\{](?=[a-zA-Z0-9\-_ \/]+)(?=[0-9]+)[\}\)\]]，但这只会导致一堆空字符串（和混乱） .似乎对 regexp_matches 或积极展望的工作方式有误解，不胜感激。

声明：

SELECT (
  regexp_matches(
    'Olympic National Park, WA. [OC][5239x3492]', 
    '[\[\(\{]([a-zA-Z0-9\-_ \/]+)[\}\)\]]', 
    'gi'
  )
);

当前输出：

 regexp_matches 
----------------
 {OC}
 {5239x3492}
(2 rows)

期望的输出：

 regexp_matches 
----------------
 {OC}
 {5239x3492}
 {5239}
 {3492}
(4 rows)

Answer 1

匹配项不能在任何正则表达式实现中重叠。但是，您可以做的是在之后像 5239x3492 那样拆分比赛：

select     u
from       t
cross join regexp_matches(col, '[\[\(\{]([a-z0-9 \/_-]+)[\}\)\]]', 'gi') m
left join  regexp_matches(m[1], '(\d+)x(\d+)', 'gi') s on true
cross join unnest(m || s) u

http://rextester.com/VKDTON30263

regexp_matches() 使用不同的正则表达式模式对同一文本进行多次捕获

Multiple captures on same text with regexp_matches() using different regex patterns

sql

regex

postgresql

regex-lookarounds