带有特殊字符 python 的正则表达式否定先行字符串

Question

它与网站上的内容维度有关。此 link checker tool 支持 Python 正则表达式。使用 link 检查器，我只想获取有关一个内容维度的信息。

除了字符串 de_de（对于 --no-follow-url 选项）之外，我想匹配所有内容。

https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de  ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop

我被困在这些方法之间：

https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)

我如何使用否定前瞻来匹配具有特殊字符（下划线）的字符串？我可以选择

这样的东西吗

(?!^de_de$)

我是正则表达式的新手，欢迎任何帮助或输入。

Answer 1

你可以试试：

https:\/\/www.example.com\/.+?(?<!de_de)\b

这匹配：

https://www.example.com/shop

但不是：

https://www.example.com/de_de

Pythex link here

解释：这里我们使用负向的(?<!de_de) 应用于单词边界（\b）。这意味着我们必须找到一个不以 "de_de".

开头的单词边界

Answer 2

使用以下正则表达式：

https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+

见regex demo。如果你也想匹配http，在pattern的http后面加上s?，https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+.

请注意，您应该转义点以匹配字符串中的真实文字点。 (?!de_de(?:/|$))[a-z_]+ 部分匹配任何 1+ letters/underscores（见 [a-z_]+）不是 de_de 后跟 / 或字符串结尾。

Python demo:

import re
ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"]
rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+"
for s in ex:
    m = re.search(rx, s)
    if m:
        print("{} => MATCHED".format(s))
    else:
        print("{} => NOT MATCHED".format(s))

输出：

https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED

带有特殊字符 python 的正则表达式否定先行字符串

Regex negative lookahead string with special character python

python

regex

string

lookahead