如何在 html 中的 url 路径中使用正则表达式否定单词？

Question

您好，我想捕获没有语言代码的 url 路径。例如，

<a href="2nd_stage_short_1.html">abc</a>
<a href="2nd_stage_short_1_sc.html">abc</a>
<a href="2nd_stage_short_1_tc.html">abc</a>
<a href="major.html">abc</a>
<a href="detail.html">abc</a>

而且我想捕获没有语言代码的 url...

<a href="2nd_stage_short_1.html">abc</a>
<a href="major.html">abc</a>
<a href="detail.html">abc</a>

我尝试使用正则表达式

\w+(?!sc|tc).html

在 http://www.regexr.com/ 但所有 url 路径都被捕获。如果我的正则表达式有任何错误，我愿意倾听。谢谢

Answer 1

我认为 Regex 不应该是这里的全部解决方案。将任务视为 2 个阶段并编写易于阅读的代码。

用于匹配所有 href 属性内容的正则表达式 \w+.html 并将匹配项分配给一个数组。
遍历结果数组，在每次迭代时匹配 _sc 或 _tc _(s|t)c 以确定是否应将它们添加到仅包含 "non language" 个页面名称的第二个数组。

如果引入新的语言后缀，生成的代码将来会很容易 understand/modify。它可以很好地注释，因为每一行都有一个功能。最终，您的团队 mates/future 自己会在尝试理解代码时欣赏清晰度。

如何在 html 中的 url 路径中使用正则表达式否定单词？

How to negate the word using regex in url path in html ?

html

regex-negation