有选择地用正则表达式替换单词

Question

是否可以使用正则表达式有选择地替换某些单词？

我的文档包含这样几行：

<type>xxx</type>

其中 xxx 可以是 bug、improvement、newfeature 和其他几个值。

我想将其转换为：

"kind":"yyy",

其中 yyy = xxx, 除了那个 improvement 应该替换为 enhancement, newfeature 与 proposal。在所有其他情况下，yyy 应与 xxx.

相同

直接正则表达式会将 <type>([^<]+)</type> 替换为 "kind":"",，但是否可以同时替换这两个特殊词？

我相信我使用的是 PCRE 引擎。

Answer 1

不可能将条件语句放在替换字符串中或将数据（不在字符串中）存储在模式本身中。

sublimetext更简单的方法显然是分几步进行（前面替换特殊字符串，后面替换一般情况）。好的方法是使用编程语言和 xml 解析器。

但是有可能用一个技巧一次性完成 replaceAll:

1) 在文件的最后添加这一行（换行）：

#improvement:enhancement#newfeature:proposal#"kind":"

2) 使用这个模式：

<type>(?|([^<]+)</type>(?=(?:.*\R)++#(?>[^:]+:[^#]+#)??:([^#]++).*#((.).*))|(([^<]+))</type>(?=(?:.*\R)++.*#((.).*)))|\R.*\z

替换为：

(</code>代表<code>"kind":"或什么都没有,</code>代表<code>enhancement,proposal,xxx or nothing, </code> 代表 <code>" or nothing.)

3) 全部替换

demo

想法很简单：将所有替换内容放在字符串本身中，并在模式 中使用分支重置 (?|.(..).|.(..).)（具有此功能，每个替代项中的捕获组具有相同的数）。添加的行会自动删除。

注意：如果您有两个以上的特殊术语要替换，请完成最后一行（但"kind":"必须留在最后），并更改?? 在 *?.

的模式中

图案详情：

<type>
(?|                        # open a branch reset group
                           # first branch: the special terms
    ([^<]+)                # capture the term in group 1
    </type>
    (?=                    # open a lookahead (nothing is consumed inside it)
        (?:.*\R)++ #       # reach the last line
        (?>[^:]+:[^#]+#)?? # skip a couple of term:repl if needed
                         # until the content of group 1 is found
        : ([^#]++)         # capture the corresponding replacement
        .* #               # reach the last #
        ((.).*)            # capture '"kind":"' in group 3 and '"' in group 4
    )                      # close the lookahead
  |                        # OR second branch: the general case
    (([^<]+))              # capture the term in group 1 and 2
                           # (to have the same number than the previous branch)
    </type>
    (?=                    # open a lookahead
        (?:.*\R)++         # same thing than the previous branch
        .* #               # but this time only '"kind":"' and '"'
        ((.).*)            # are needed
    )
)                          # close the branch reset group
|                          # OR
\R.*\z                     # the last line (in this case all the
                           # groups are empty)

\R 是几种类型的换行符的别名（无论系统如何）。

(?>....)是一个原子团。

++、*+、?+ 是所有格量词。

\z 是字符串结尾的锚点。

有选择地用正则表达式替换单词

Selectively replace words with regular expressions

regex

pcre

sublimetext3