正则表达式替换:替换文本,而不是代码
Regex substitution: Replace texts, not codes
几天来我一直在尝试解决正则表达式的测验,但仍然做不对。我已经很接近了,但仍然无法通过。
任务:
In an HTML page, replace the text micro
with µ
. Oh, and don't screw up the code: don't replace inside <the tags>
or &entities;
替换
micro
-> µ
abc micro
-> abc µ
micromicro
-> µµ
µmicro
-> µµ
请勿触摸
<tag micro />
-> <tag micro />
µ
-> µ
&abcmicro123;
-> &abcmicro123;
I tried this 但最后 µ
失败了,我错过了什么?有人可以指出我错过了什么吗?提前致谢!
我尝试过的:
正则表达式
((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro
换人
µ
您可以尝试这样的操作:
(?:<.*?>|&\w++;)(*SKIP)(*F)|micro
替换字符串:
µ
使用SKIP-FAIL technique,但匹配整个单词:
(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b
说明
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(*SKIP)(*F) Skip the match and go on matching from current location
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
micro 'micro'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
var strings = [
"micro",
"abc micro",
"micromicro",
"µmicro",
"<tag micro />",
"µ",
"&abcmicro123;"
];
var re = /(?<!(<[^>]*|&[^;]*))(micro)/g;
strings.forEach(function(str) {
var result = str.replace(re, '&;')
console.log(str + ' -> ' + result)
});
控制台日志输出:
micro -> µ
abc micro -> abc µ
micromicro -> µµ
µmicro -> µµ
<tag micro /> -> <tag micro />
µ -> µ
&abcmicro123; -> &abcmicro123;
解释:
- 使用
(?<!...)
- 负面回溯排除微型内部标签或实体
(<[^>]*|&[^;]*)
- 内部否定先行跳过 <...>
或 '&...;'
(micro)
- 捕获您的标签(根据需要添加多个,例如 (micro|brewery)
)
'&;'
- 替换将捕获的标签变成实体 &...;
几天来我一直在尝试解决正则表达式的测验,但仍然做不对。我已经很接近了,但仍然无法通过。
任务:
In an HTML page, replace the text
micro
withµ
. Oh, and don't screw up the code: don't replace inside<the tags>
or&entities;
替换
micro
->µ
abc micro
->abc µ
micromicro
->µµ
µmicro
->µµ
请勿触摸
<tag micro />
-><tag micro />
µ
->µ
&abcmicro123;
->&abcmicro123;
I tried this 但最后 µ
失败了,我错过了什么?有人可以指出我错过了什么吗?提前致谢!
我尝试过的:
正则表达式
((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro
换人
µ
您可以尝试这样的操作:
(?:<.*?>|&\w++;)(*SKIP)(*F)|micro
替换字符串:
µ
使用SKIP-FAIL technique,但匹配整个单词:
(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b
说明
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(*SKIP)(*F) Skip the match and go on matching from current location
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
micro 'micro'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
var strings = [
"micro",
"abc micro",
"micromicro",
"µmicro",
"<tag micro />",
"µ",
"&abcmicro123;"
];
var re = /(?<!(<[^>]*|&[^;]*))(micro)/g;
strings.forEach(function(str) {
var result = str.replace(re, '&;')
console.log(str + ' -> ' + result)
});
控制台日志输出:
micro -> µ
abc micro -> abc µ
micromicro -> µµ
µmicro -> µµ
<tag micro /> -> <tag micro />
µ -> µ
&abcmicro123; -> &abcmicro123;
解释:
- 使用
(?<!...)
- 负面回溯排除微型内部标签或实体 (<[^>]*|&[^;]*)
- 内部否定先行跳过<...>
或 '&...;'(micro)
- 捕获您的标签(根据需要添加多个,例如(micro|brewery)
)'&;'
- 替换将捕获的标签变成实体&...;