如何从 [url] 中获取所有内容，包括方括号和匹配组 1 和 2

Question

我有这个正则表达式

/\[url=(?:&quot;)?(.*?)(?:&quot;)?\](.*?)\[\/url\]/mi

和这些文本块

[url=/someurl?page=5#3467]First[/url][postquote=true]
[url=/another_url/who-is?page=4#3396] Second[/url]
Some text[url=/another_url/who-is?page=3][i]3[/i] Third [/url]

并且正则表达式在提取 urls 和 urls

之间的文本方面表现出色

匹配 1

1.  /someurl?page=5#3467
2.  First

匹配 2

1.  /another_url/who-is?page=4#3396
2.  Second

匹配 3

1.  /another_url/who-is?page=3
2.  [i]3[/i] Third

当我使用上面的相同正则表达式尝试从该文本中提取 url 时出现问题

This is some text [url=https://www.somesite.com/location/?opt[]=apples]Link Name[/url]

匹配 1

1.  https://www.somesite.com/location/?opt[
2.  =apples]Link Name

注意第二场比赛中的 =apples]。我需要的是匹配的第一个匹配项，以将其包含在 url like

中

https://www.somesite.com/location/?opt[]=apples
Link 姓名

我已尝试对此正则表达式进行多次修改，但到目前为止还没有成功，我们将不胜感激。

Answer 1

Ruby 正则表达式具有重复命名捕获功能。使用此功能，您可以轻松处理这两种情况 （一种情况 &quote;，另一种情况）。您不必使用递归模式，因为我怀疑 [] 可以嵌套在 url:

的查询部分中

/\[url=(?:&quote;(?<url>[^&]*(?:&(?!quote;)[^&]*)*)&quote;|(?<url>[^\s\]\[]*(?:\[\][^\s\]\[]*)*))\](?<text>.*?)\[\/url\]/mi

url在命名组url中，标签之间的内容在命名组text[=22中=].

采用更易读的格式：

/ \[url= (?: &quote; (?<url> [^&]* (?:&(?!quote;)[^&]*)* ) &quote; | (?<url> [^\s\]\[]* (?:\[\][^\s\]\[]*)* ) ) \] (?<text>.*?)\[\/url\] /mix

如何从 [url] 中获取所有内容，包括方括号和匹配组 1 和 2

How do I grab all the content from within [url] including square brackets and match group 1 and 2

regex

ruby-on-rails

bbcode