模仿负后视以匹配 JavaScript 正则表达式中不紧跟特定字符的模式

Question

我在 Javascript 中有这个正则表达式：

0x[A-F0-9]{2}\g

我想修改它，以便在前一个字符不是 \ 时获得匹配。类似的东西：

0x60 -> 真
[=15=]x60 -> 错误

我想出了类似的东西，但它不能正常工作：

[^\]0x[A-F0-9]{2}\g

它匹配除 \ 之外的所有内容，这里的所有内容我的意思是：

a0x50 -> 正确，包括 "a"
_0x50 -> 正确，包括“_”
...
[=19=]x50 -> 错误

正则表达式示例：regex101, followed by a Plnkr.

有可能实现吗？谢谢。

Answer 1

JavaScript 不支持 lookbehinds，正如您已经建议的那样，以下内容将占用一个额外的字符（0x 之前的字符）：

/[^\]0x[A-F0-9]{2}/g

你可以做一些丑陋的黑客，比如：

'\0x25 0x60'.match(/([^\]|^)0x[A-F0-9]{2}/g).map(function(val) {
  return val.slice(1);
});
['0x60']

这将消耗前导字符，但通过对 matches 数组的迭代将其删除。

然而，这使得像 0x600x60 这样的输入给出 ['0x60'] 而不是 ['0x60', '0x60']

Answer 2

要点是将您通常放入否定回顾中的模式匹配到可选的捕获组中，然后检查该组是否匹配。如果是，则不需要匹配，否则，使用它。

如果需要匹配收集子串，使用

var re = /(\?)0x[A-F0-9]{2}/gi; 
var str = '\0x50 0x60 asdasda0x60';
var res = [];
while ((m = re.exec(str)) !== null) {
 if (!m[1]) {
   res.push(m[0]); 
  }
}
document.body.innerHTML = "TEST: " + str + "<br/>";
document.body.innerHTML += "RES: " + JSON.stringify(res,0,4) + "<br/>";

如果您只需要替换那些在 0x.. 之前没有 \ 的字符串，请使用replace 方法中的回调，用于检查第 1 组是否匹配。如果有，则替换为整个匹配项，如果没有，则替换为您需要的模式。

var re = /(\?)0x[A-F0-9]{2}/gi; 
var str = '\0x50 0x60 asdasda0x60';
var res = str.replace(re, function(m, group1){
 return group1 ? m : "NEW_VAL";
});
document.body.innerHTML = "TEST: " + str + "<br/>";
document.body.innerHTML += "RES: " + res + "<br/>";

Answer 3

你可以同时匹配坏的和好的。
这将使它在 all 上对齐，这样您就不会错过任何一个。

(?:\0x[A-F0-9]{2}|(0x[A-F0-9]{2}))

在这种情况下，只有好的会出现在捕获组 1 中。

 (?:
      \ 0x [A-F0-9]{2}     # Bad
   |  
      ( 0x [A-F0-9]{2} )    # (1), Good
 )

Answer 4

这样做就可以了：

(?:[^\]|^)0x[A-F0-9]{2}

var myregexp = /(?:[^\]|^)0x[A-F0-9]{2}/mg;
var subject = '0x60 [=11=]x99 0x60 [=11=]x99 0x60 0x60';
var match = myregexp.exec(subject);
while (match != null) {
 for (var i = 0; i < match.length; i++) {
  document.body.innerHTML += match[i]+ "<br/>";
 }
 match = myregexp.exec(subject);
}

正则表达式解释：

(?:[^\]|^)0x[A-F0-9]{2}

Match the regular expression below «(?:[^\]|^)»
   Match this alternative (attempting the next alternative only if this one fails) «[^\]»
      Match any character that is NOT the backslash character «[^\]»
   Or match this alternative (the entire group fails if this one fails to match) «^»
      Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed, line feed, line separator, paragraph separator) «^»
Match the character string “0x” literally (case insensitive) «0x»
Match a single character present in the list below «[A-F0-9]{2}»
   Exactly 2 times «{2}»
   A character in the range between “A” and “F” (case insensitive) «A-F»
   A character in the range between “0” and “9” «0-9»

Answer 5

如果您正在使用 Node，或者愿意打开浏览器标志（来自 here），那么您很幸运：

Lookbehind assertions are currently in a very early stage in the TC39 specification process. However, because they are such an obvious extension to the RegExp syntax, we decided to prioritize their implementation. You can already experiment with lookbehind assertions by running V8 version 4.9 or later with --harmony, or by enabling experimental JavaScript features (use about:flags) in Chrome from version 49 onwards.

现在当然只是

/(?<!\)0x[A-F0-9]{2}/g

在此 answer 中还有其他方法可以模拟后视。我最喜欢的是反转字符串并使用前瞻。

var re = /[A-F0-9]{2}x0(?!\)/g;
var str = "0x60 [=11=]x33";

function reverse(s) { return s.split('').reverse().join(''); }

document.write(reverse(str).match(re).map(reverse));

模仿负后视以匹配 JavaScript 正则表达式中不紧跟特定字符的模式

Mimicking negative lookbehind to match a pattern not immediately preceded with a specific character in JavaScript regex

javascript

regex

regex-lookarounds

lookbehind

negative-lookbehind