包含一个字符串但不包含另一个字符串的字符串的正则表达式

Question

我们项目中的正则表达式匹配包含字符串的任何 url "/pdf/":

(.+)/pdf/.+

需要对其进行修改，使其不匹配还包含 "help"

的 url

示例：

不应匹配：“/dealer/help/us/en/pdf/simple.pdf” 应该匹配：“/dealer/us/en/pdf/simple.pdf”

Answer 1

(?:^|\s)((?:[^h ]|h(?!elp))+\/pdf\/\S*)(?:$|\s)

首先是匹配 space 或行首

(?:^|\s)

然后我们匹配任何不是 </code> 或 <code>h 或任何 h 后面没有 elp 的任何东西，一个或更多次 +，直到我们找到 /pdf/，然后匹配非 space 字符 \S 任意次数 *.

((?:[^h ]|h(?!elp))+\/pdf\/\S*)

如果我们想在/pdf/之后检测help，我们可以从头开始重复匹配。

((?:[^h ]|h(?!elp))+\/pdf\/(?:[^h ]|h(?!elp))+)

最后我们匹配一个</code>或者结束line/string(<code>$)

(?:$|\s)

完整匹配项将包括 leading/trailing spaces，应该被删除。如果您使用捕获组 1，则不需要剥离末端。

Answer 2

如果支持lookarounds，这很容易实现：

(?=.*/pdf/)(?!.*help)(.+)

Regex for string containing one string, but not another