带有特殊字符 python 的正则表达式否定先行字符串

Regex negative lookahead string with special character python

它与网站上的内容维度有关。此 link checker tool 支持 Python 正则表达式。使用 link 检查器,我只想获取有关一个内容维度的信息。

除了字符串 de_de(对于 --no-follow-url 选项)之外,我想匹配所有内容。

https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de  ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop

我被困在这些方法之间:

https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)

我如何使用否定前瞻来匹配具有特殊字符(下划线)的字符串?我可以选择

这样的东西吗
(?!^de_de$)

我是正则表达式的新手,欢迎任何帮助或输入。

你可以试试:

https:\/\/www.example.com\/.+?(?<!de_de)\b

这匹配:

https://www.example.com/shop

但不是:

https://www.example.com/de_de

Pythex link here

解释:这里我们使用负向的(?<!de_de) 应用于单词边界(\b)。这意味着我们必须找到一个不以 "de_de".

开头的单词边界

使用以下正则表达式:

https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+

regex demo。如果你也想匹配http,在pattern的http后面加上s?https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+.

请注意,您应该转义点以匹配字符串中的真实文字点。 (?!de_de(?:/|$))[a-z_]+ 部分匹配任何 1+ letters/underscores(见 [a-z_]+)不是 de_de 后跟 / 或字符串结尾。

Python demo:

import re
ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"]
rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+"
for s in ex:
    m = re.search(rx, s)
    if m:
        print("{} => MATCHED".format(s))
    else:
        print("{} => NOT MATCHED".format(s))

输出:

https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED