正则表达式用于排除标签集中的指定标签组

Question

<info>
<owner>
<p>Owners:</p>
<p>1. New Owner_1</p>
<p>2. New Owner_2</p>
</owner>
<addons>
<p>Name of dog: Alex</p>
<p>1. Text blah blah blah</p>
<p>2. Text blah blah blah</p>
<p>3. Text blah blah blah</p>
<p>4. Text blah blah blah</p>
<p>OR MORE Text blah blah blah</p>
</addons>
<p>DETAILS</p>
<p>1. Vicky Mears 1st dog's owner.</p>
<p>2. Paul Nash 2nd dog's owner.</p>
<p>3. Dog found last Apr. 2016</p>
</info>

您好，我目前正在为学校学习正则表达式。我的老师指出了上面所示结构的一个问题。

他问我们如何寻找：

<p>1. ...</p>
<p>2. ...</p>
<p>3. ...</p>

但不在里面：

<p>DETAILS</p>
<p>1. Vicky Mears 1st dog's owner.</p>
<p>2. Paul Nash 2nd dog's owner.</p>
<p>3. Dog found last Apr. 2016</p>
</info>

<owner></owner>, <addons></addons> 标签在某些时候会有所不同，因此无需指定母标签，只需排除在以下位置找到的所有内容即可：

<p>DETAILS</p>
<p>1. Vicky Mears 1st dog's owner.</p>
<p>2. Paul Nash 2nd dog's owner.</p>
<p>3. Dog found last Apr. 2016</p>
</info>

我用这个

(?s)<p>DETAILS</p>(.*?)</info>

但它找到了我要排除的那个。

谁能帮我解决这个问题？ Whosebug 是我最后的选择。

PS: 在 Notepad++ v6.8.3 中仅使用 RegEx 进行搜索

Answer 1

我会这样做

<p>DETAILS[\s\S]*|(<p>.*?<\/p>)|.*

演示：https://regex101.com/r/IB9QFu/1/

想法是使用布尔值OR |首先匹配DETAILS之后出现的任何内容，然后开始匹配<p>..</p>.

您可以在 - notepad++ - Replace All 和 </code> 中只保留所需的 <code><p>..</p>.

如果您想在 RegExr 中测试它，请使用此代码

/<p>DETAILS[\s\S]*|(<p>.*?<\/p>)|^.*$/gm

正则表达式用于排除标签集中的指定标签组

RegEx for excluding a specifying group of tag within set of tags

regex

html-parsing

regex-group

regex-greedy

regex-lookarounds