Notepad++ RegEx 在单词匹配时删除标签之间

Question

我有一个类似的问题，这次我需要将它用于关键字。下面是我从 KML 文件中使用的示例数据。我想删除所有包含人行道一词的地标。

 <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>highway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>     
    <Placemark>
        <styleUrl>#nothing</styleUrl>
        <ExtendedData>
            <SchemaData>
                <SimpleData>footway</SimpleData>
            </SchemaData>
        </ExtendedData>
        <LineString>
            <coordinates>0.0000,0.0000,0</coordinates>
        </LineString>
    </Placemark>

我尝试使用以下内容，但它正在捕获所有内容

(?i)<Placemark>.*?footway.*?</Placemark>

下面是我的记事本++设置

Find what: (?i)<Placemark>.*?footway.*?</Placemark>
Replace with:
Warp around
Search Mode: Regular expression & mathces newline

Answer 1

这适用于我的 Notepad++ 6.9.2。它也适用于此在线 python 正则表达式测试器：https://regex101.com/r/BYGvzo/1

您确定设置的选项 (regular expression + . matches newline) 正确吗？

编辑：好吧，在你的编辑之后，情况就不同了！不确定如何使用正则表达式实现它。我认为解析 XML 然后删除包含单词 footway 的节点会更容易。

查看原因：RegEx match open tags except XHTML self-contained tags

Answer 2

这里有一个方法：

查找内容：<Placemark>(?:(?!<Placemark).)*footway(?:.(?!<Placemark))*</Placemark>
替换为：NOTHING

这将替换所有包含 footway 的 <Placemark> 块，并且只替换它们。

(?!<Placemark) 是一个负值 lookahead，假设 footway 之前没有 <Placemark>，所以，当你有很多 <Placemark> 时正则表达式一次匹配一个 <Placemark>。

(?:(?!<Placemark).)* 是非捕获组，出现 0 次或多次，并且不包含后跟字符的 (?!<Placemark)。

Answer 3

正在简化您的文件，它看起来像下面的第一行，您的正则表达式与第二行匹配

<Placemark> ... </Placemark> <Placemark> ...footway ... </Placemark>
<Placemark>    .*?                          footway .*? </Placemark>

需要防止第一个 </Placemark> 被包含在匹配项中。

如果这是一次性的或很少需要的过程，那么我有时使用的方法如下，因为它的适应性很强。查找文件中任何地方都没有出现的单个字符。对于此示例，使用 =。用 = 替换所有正则表达式 (</?p)(lacemark>)。通往正文：

<P=lacemark> ... </P=lacemark> <P=lacemark> ...footway ... </P=lacemark>

然后用正则表达式 <P=lacemark>[^=]*footway[^=]*</P=lacemark> 替换所有内容。最后，用另一个替换全部删除所有 = 个字符。

如果没有易于使用的单个字符（即用某些东西代替 =），则在上述步骤之前进行一些替换以创建未使用的字符。例如，首先将所有 & 替换为 &，然后将所有 = 替换为 &eq;。现在 = 可以免费使用了。完成上述步骤后，撤消替换，先将所有&eq;替换为=，然后将所有&替换为&。

Notepad++ RegEx 在单词匹配时删除标签之间

Notepad++ RegEx remove between tags when word matched

regex

notepad++

kml