正则表达式仅匹配第一次出现的 html 元素

Question

是的，我知道，"don't parse HTML with Regex"。我是在记事本++中做的，这是一次性的事情，所以请耐心等待我一会儿。

我正在尝试使用一些更高级的技术来简化一些 HTML 代码。值得注意的是，在我的文档中，我有 "inserts" 或 "callouts" 或任何你称之为的东西，指示 "note"、"warning" 和 "technical" 短语来吸引人们的注意力reader关于重要信息：

<div class="note">
    <p><strong>Notes</strong>: This icon shows you something that complements 
     the information around it. Understanding notes is not critical but 
     may be helpful when using the product.</p>
</div>
<div class="warning">
    <p><strong>Warnings</strong>: This icon shows information that may 
     be critical when using the product. 
     It is important to pay attention to these warnings.</p>
</div>
<div class="technical">
    <p><strong>Technical</strong>: This icon shows technical information 
     that may require some technical knowledge to understand. </p>
</div>

我想将此 HTML 简化为以下内容：

<div class="box note"><strong>Notes</strong>: This icon shows you something that complements 
     the information around it. Understanding notes is not critical but 
     may be helpful when using the product.</div>
<div class="box warning"><strong>Warnings</strong>: This icon shows information that may 
     be critical when using the product. 
     It is important to pay attention to these warnings.</div>
<div class="box technical"><strong>Technical</strong>: This icon shows technical information 
     that may require some technical knowledge to understand.</div>

我几乎拥有从记事本++在我的项目中进行良好的全局搜索和替换所必需的正则表达式，但它没有选择 "only" 第一个 div，它正在拾取所有这些 - 如果我的光标位于我的文件的开头，当我单击“查找”时 "select" 是从第一个 <div class="something"> 到最后一个 </div>，本质上。

这是我的表达方式：<div class="(.*[^"])">[^<]*<p>(.*?)<\/p>[^<]*<\/div>（notepad++ "automatically" 在它周围添加了 //，有点）。

我在这里做错了什么？

Answer 1

你在匹配 class 属性时有一个贪婪的点量词 - 那是导致你出现问题的坏人。

使其成为非贪婪的：<div class="(.*?[^"])">或将其更改为字符class：<div class="([^"]*)">。

比较：greedy class vs. non-greedy class。

正则表达式仅匹配第一次出现的 html 元素

Regex to match only the first occurrence of an html element

regex

xhtml

notepad++