匹配给定的正则表达式,除非给定的单词存在(先行或后行)
Match a given regex except if a given word exist (lookahead or lookbehind)
我正在使用 javascript 正则表达式来解析一系列 URL。我需要匹配 URL 中的数字(它实际上更复杂,但我正在简化),但只想匹配给定单词不在 URL 中的数字。
也就是说,我想排除其中包含单词 'changelogs' 的行,因此会捕获 '1047'、'1048'、'1245' 和 '1049' 来自以下列表;
http://www.opera.com/docs/changelogs/unified/1215/
http://www.whatever.com/docs/changelogs/anythingelse/anything/1215/
http://www.blabblah/security/advisory/1047
http://booger/security/advisory/1048/
ftp://msn.global.whatever/somethingelse/1245
whatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/
我知道我需要某种环顾四周的前瞻性回顾,但我正在罢工。这是我尝试过的最后一个模式;
(?!changelogs)(\d+)
Here is the regex101 sandbox I'm using.
此外,唯一匹配的是实际数字,这一点很重要。我不想要任何其他匹配项。
这是我的 .NET 代码的样子(注意 "BulletinOrAdvisoryPattern" 是有问题的正则表达式)...
Regex bulletinPattern = new Regex(@matchingDomain.Vendor.BulletinOrAdvisoryPattern, RegexOptions.IgnoreCase );
Match bulletinMatch = bulletinPattern.Match(referenceTitle);
if (bulletinMatch.Success)
{
//Found the bulletin ID in the NVD Reference Title
return bulletinMatch.Value;
}
像下面这样的东西应该可以做到。如果您不仅对 opera 感兴趣,您可以通过将 opera 替换为 .+
来调整它以使其更通用。此外,您可以将 com 和 net 之类的东西与 (com|net|org|gov)
之类的东西相匹配,代替 com :
http:\/\/www\.opera\.com(?!.*changelogs)(\/[^\/]+)*\/(\d+)\/{0,1}
此模式排除其中包含 'changelogs' 的行,并查找最后一次出现的由斜杠封装的数字。
(?:\/)(?!.*changelogs)(?:\/[^\/]+)*\/(\d+)\/{0,1}
这里是 updated regex 101.
您需要的 "ugly" 正则表达式是
(?<=http://www\.opera\.com\b(?!.*/changelogs(?:/|$))\S*)\d+
但是,您只需要
var result = input.Contains("/changelogs/") ? "" : input.Trim('/').Split('/').LastOrDefault();
var lst = new List<string>() {"http://w...content-available-to-author-only...a.com/docs/changelogs/unified/1215/",
"http://w...content-available-to-author-only...a.com/docs/changelogs/anythingelse/anything/1215/",
"http://w...content-available-to-author-only...a.com/security/advisory/1047",
"http://w...content-available-to-author-only...a.com/security/advisory/1048/",
"http://w...content-available-to-author-only...a.com/doesnt/matter/could/be/anything/1049/"};
lst.ForEach(m => Console.WriteLine(
m.Contains("/changelogs/") ? "" : m.Trim('/').Split('/').LastOrDefault()
));
更新
您将语言从 C# 切换到 JavaScript,这极大地改变了情况,因为 JS 正则表达式引擎不支持回顾。
因此,您必须解决它,并且有一些方法可以模仿 lookbehind,或者只使用捕获机制。
如果可以使用捕捉,试试
/^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/
var re = /^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/gmi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = JSON.stringify(res, 0, 4);
或者,使用可选组(如果您要替换):
var re = /(\/changelogs\/.*)?\/(\d+)/gi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var result = str.replace(re, function (m, g1, g2){
return g1 ? m : "NEW_VAL";
});
document.body.innerHTML = result;
我正在使用 javascript 正则表达式来解析一系列 URL。我需要匹配 URL 中的数字(它实际上更复杂,但我正在简化),但只想匹配给定单词不在 URL 中的数字。
也就是说,我想排除其中包含单词 'changelogs' 的行,因此会捕获 '1047'、'1048'、'1245' 和 '1049' 来自以下列表;
http://www.opera.com/docs/changelogs/unified/1215/
http://www.whatever.com/docs/changelogs/anythingelse/anything/1215/
http://www.blabblah/security/advisory/1047
http://booger/security/advisory/1048/
ftp://msn.global.whatever/somethingelse/1245
whatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/
我知道我需要某种环顾四周的前瞻性回顾,但我正在罢工。这是我尝试过的最后一个模式;
(?!changelogs)(\d+)
Here is the regex101 sandbox I'm using.
此外,唯一匹配的是实际数字,这一点很重要。我不想要任何其他匹配项。
这是我的 .NET 代码的样子(注意 "BulletinOrAdvisoryPattern" 是有问题的正则表达式)...
Regex bulletinPattern = new Regex(@matchingDomain.Vendor.BulletinOrAdvisoryPattern, RegexOptions.IgnoreCase );
Match bulletinMatch = bulletinPattern.Match(referenceTitle);
if (bulletinMatch.Success)
{
//Found the bulletin ID in the NVD Reference Title
return bulletinMatch.Value;
}
像下面这样的东西应该可以做到。如果您不仅对 opera 感兴趣,您可以通过将 opera 替换为 .+
来调整它以使其更通用。此外,您可以将 com 和 net 之类的东西与 (com|net|org|gov)
之类的东西相匹配,代替 com :
http:\/\/www\.opera\.com(?!.*changelogs)(\/[^\/]+)*\/(\d+)\/{0,1}
此模式排除其中包含 'changelogs' 的行,并查找最后一次出现的由斜杠封装的数字。
(?:\/)(?!.*changelogs)(?:\/[^\/]+)*\/(\d+)\/{0,1}
这里是 updated regex 101.
您需要的 "ugly" 正则表达式是
(?<=http://www\.opera\.com\b(?!.*/changelogs(?:/|$))\S*)\d+
但是,您只需要
var result = input.Contains("/changelogs/") ? "" : input.Trim('/').Split('/').LastOrDefault();
var lst = new List<string>() {"http://w...content-available-to-author-only...a.com/docs/changelogs/unified/1215/",
"http://w...content-available-to-author-only...a.com/docs/changelogs/anythingelse/anything/1215/",
"http://w...content-available-to-author-only...a.com/security/advisory/1047",
"http://w...content-available-to-author-only...a.com/security/advisory/1048/",
"http://w...content-available-to-author-only...a.com/doesnt/matter/could/be/anything/1049/"};
lst.ForEach(m => Console.WriteLine(
m.Contains("/changelogs/") ? "" : m.Trim('/').Split('/').LastOrDefault()
));
更新
您将语言从 C# 切换到 JavaScript,这极大地改变了情况,因为 JS 正则表达式引擎不支持回顾。
因此,您必须解决它,并且有一些方法可以模仿 lookbehind,或者只使用捕获机制。
如果可以使用捕捉,试试
/^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/
var re = /^(?!.*\/changelogs(?:\/|$)).*\/(\d+)/gmi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var res = [];
while ((m = re.exec(str)) !== null) {
res.push(m[1]);
}
document.body.innerHTML = JSON.stringify(res, 0, 4);
或者,使用可选组(如果您要替换):
var re = /(\/changelogs\/.*)?\/(\d+)/gi;
var str = 'http://www.opera.com/docs/changelogs/unified/1215/\nhttp://www.whatever.com/docs/changelogs/anythingelse/anything/1215/\nhttp://www.blabblah/security/advisory/1047\nhttp://booger/security/advisory/1048/\nftp://msn.global.whatever/somethingelse/1245\nwhatever/it/doesnt/matter/could/be/anything/i/still/want/this/number/1049/';
var result = str.replace(re, function (m, g1, g2){
return g1 ? m : "NEW_VAL";
});
document.body.innerHTML = result;