我如何制定此正则表达式以避免嵌套或非贪婪匹配？

Question

我正在为 Drupal 开发一个基于正则表达式的过滤器。这是正则表达式 /[asciidoc]((.|\n)*)\[\/asciidoc]。当它在文本中多次出现时，[asciidoc] 和 [/asciidoc] 的所有实例都与第一个和最后一个匹配。

例如。在代码中有一段代码像

[asciidoc] here is some text to be filtered[/asciidoc]
a bit of text
[asciidoc]some text in a second block[/asciidoc]

here is some text to be filtered 和 some text in a second block 应该由过滤器处理，但是

here is some text to be filtered[/asciidoc]
a bit of text
[asciidoc]some text in a second block

在第一个和最后一个块标记之间匹配。当我在 regex101 中测试时，注释说正则表达式以 greedy 方式匹配块中的代码，所以它是一个 non-greedy 正则表达式，不允许这种块嵌套我需要。

正确的正则表达式应该是什么？我不熟悉正则表达式术语，所以可能错误地使用了一些术语。

Answer 1

此正则表达式应与 DOTALL 标志和惰性量词一起使用：

$re = '~\[asciidoc](.*?)\[/asciidoc]~s'

如果您想使用 / 作为不支持 DOTALL 标志（如 Javascript）的正则表达式分隔符，请使用：

/\[asciidoc]([\s\S]*?)\[\/asciidoc]/

How can I formulate this regex to avoid nesting or non-greedy matching?