正则表达式替换:替换文本,而不是代码

Regex substitution: Replace texts, not codes

几天来我一直在尝试解决正则表达式的测验,但仍然做不对。我已经很接近了,但仍然无法通过。

任务:

In an HTML page, replace the text micro with &micro;. Oh, and don't screw up the code: don't replace inside <the tags> or &entities;

替换

请勿触摸


I tried this 但最后 &micro; 失败了,我错过了什么?有人可以指出我错过了什么吗?提前致谢!

我尝试过的:

正则表达式

((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro

换人

&micro;

您可以尝试这样的操作:

(?:<.*?>|&\w++;)(*SKIP)(*F)|micro

替换字符串:

&micro;

使用SKIP-FAIL technique,但匹配整个单词:

(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b

proof

说明

--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    [^<>]*                   any character except: '<', '>' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ;                        ';'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (*SKIP)(*F)              Skip the match and go on matching from current location
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  micro                    'micro'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

var strings = [
    "micro",
    "abc micro",
    "micromicro",
    "&micro;micro",
    "<tag micro />",
    "&micro;",
    "&abcmicro123;"
];
var re = /(?<!(<[^>]*|&[^;]*))(micro)/g;
strings.forEach(function(str) {
    var result = str.replace(re, '&;')
    console.log(str + ' -> ' + result)
});

控制台日志输出:

micro -> &micro;
abc micro -> abc &micro;
micromicro -> &micro;&micro;
&micro;micro -> &micro;&micro;
<tag micro /> -> <tag micro />
&micro; -> &micro;
&abcmicro123; -> &abcmicro123;

解释:

  • 使用 (?<!...) - 负面回溯排除微型内部标签或实体
  • (<[^>]*|&[^;]*) - 内部否定先行跳过 <...> 或 '&...;'
  • (micro) - 捕获您的标签(根据需要添加多个,例如 (micro|brewery)
  • '&;' - 替换将捕获的标签变成实体 &...;