带有特殊字符 python 的正则表达式否定先行字符串
Regex negative lookahead string with special character python
它与网站上的内容维度有关。此 link checker tool 支持 Python 正则表达式。使用 link 检查器,我只想获取有关一个内容维度的信息。
除了字符串 de_de
(对于 --no-follow-url
选项)之外,我想匹配所有内容。
https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop
我被困在这些方法之间:
https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)
我如何使用否定前瞻来匹配具有特殊字符(下划线)的字符串?我可以选择
这样的东西吗
(?!^de_de$)
我是正则表达式的新手,欢迎任何帮助或输入。
你可以试试:
https:\/\/www.example.com\/.+?(?<!de_de)\b
这匹配:
https://www.example.com/shop
但不是:
https://www.example.com/de_de
Pythex link here
解释:这里我们使用负向的(?<!de_de)
应用于单词边界(\b
)。这意味着我们必须找到一个不以 "de_de".
开头的单词边界
使用以下正则表达式:
https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
见regex demo。如果你也想匹配http
,在pattern的http
后面加上s?
,https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
.
请注意,您应该转义点以匹配字符串中的真实文字点。 (?!de_de(?:/|$))[a-z_]+
部分匹配任何 1+ letters/underscores(见 [a-z_]+
)不是 de_de
后跟 /
或字符串结尾。
import re
ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"]
rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+"
for s in ex:
m = re.search(rx, s)
if m:
print("{} => MATCHED".format(s))
else:
print("{} => NOT MATCHED".format(s))
输出:
https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED
它与网站上的内容维度有关。此 link checker tool 支持 Python 正则表达式。使用 link 检查器,我只想获取有关一个内容维度的信息。
除了字符串 de_de
(对于 --no-follow-url
选项)之外,我想匹配所有内容。
https://www.example.com/int_en
https://www.example.com/int_de
https://www.example.com/de_de ##should not match or all others should match
https://www.example.com/be_de
https://www.example.com/fr_fr
https://www.example.com/gb_en
https://www.example.com/us_en
https://www.example.com/ch_de
https://www.example.com/ch_it
https://www.example.com/shop
我被困在这些方法之间:
https:\/\/www.example.com\/\bde\_de
https:\/\/www.example.com\/[^de]{2,3}[^de]
https:\/\/www.example.com\/[a-z]{2,3}\_[^d][^e]
https:\/\/www.example.com\/([a-z]{2,3}\_)(?!^de$)
https:\/\/www.example.com\/[a-z]{2,3}\_
https:\/\/www.example.com\/(?!^de\_de$)
我如何使用否定前瞻来匹配具有特殊字符(下划线)的字符串?我可以选择
这样的东西吗(?!^de_de$)
我是正则表达式的新手,欢迎任何帮助或输入。
你可以试试:
https:\/\/www.example.com\/.+?(?<!de_de)\b
这匹配:
https://www.example.com/shop
但不是:
https://www.example.com/de_de
Pythex link here
解释:这里我们使用负向的(?<!de_de)
应用于单词边界(\b
)。这意味着我们必须找到一个不以 "de_de".
使用以下正则表达式:
https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
见regex demo。如果你也想匹配http
,在pattern的http
后面加上s?
,https?://www\.example\.com/(?!de_de(?:/|$))[a-z_]+
.
请注意,您应该转义点以匹配字符串中的真实文字点。 (?!de_de(?:/|$))[a-z_]+
部分匹配任何 1+ letters/underscores(见 [a-z_]+
)不是 de_de
后跟 /
或字符串结尾。
import re
ex = ["https://www.example.com/int_en","https://www.example.com/int_de","https://www.example.com/de_de","https://www.example.com/be_de","https://www.example.com/de_en","https://www.example.com/fr_en","https://www.example.com/fr_fr","https://www.example.com/gb_en","https://www.example.com/us_en","https://www.example.com/ch_de","https://www.example.com/ch_it"]
rx = r"https://www\.example\.com/(?!de_de(?:/|$))[a-z_]+"
for s in ex:
m = re.search(rx, s)
if m:
print("{} => MATCHED".format(s))
else:
print("{} => NOT MATCHED".format(s))
输出:
https://www.example.com/int_en => MATCHED
https://www.example.com/int_de => MATCHED
https://www.example.com/de_de => NOT MATCHED
https://www.example.com/be_de => MATCHED
https://www.example.com/de_en => MATCHED
https://www.example.com/fr_en => MATCHED
https://www.example.com/fr_fr => MATCHED
https://www.example.com/gb_en => MATCHED
https://www.example.com/us_en => MATCHED
https://www.example.com/ch_de => MATCHED
https://www.example.com/ch_it => MATCHED