如何将某些内容附加到正则表达式匹配的开头？

Question

这是正则表达式代码：

without_header = re.findall('/sports/[a-z0-9\/\.\-\:]*[0-9\.]+cms', without_header_url)

它 returns 我是每个 URL 的输出，前面没有 https header。例如：

/sports/cricket/ipl/top-stories/kxip-vs-csk-shane-watson-faf-du-plessis-infuse-life-into-csks-ipl-campaign-shape-confidence-boosting-win-over-kxip/articleshow/78481088.cms'
/sports/football/epl/top-stories/epl-manchester-united-humiliated-as-mourinhos-spurs-win-6-1-at-old-trafford/articleshow/78481304.cms

为此，我想在开头附加“https://example.com”。我不想要 for 循环，有什么有效的方法可以使用 re.sub?

Answer 1

您可以在 re.sub:

中使用此正则表达式

(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)

RegEx Demo

代码：

s = re.sub(r'(?<!:/)(/sports/[a-z0-9/.:-]*[0-9.]+cms)', r'https://', s)

正则表达式详细信息：

(?<!:/)：否定回顾断言我们在之前的位置

:/

(/sports/[a-z0-9/.:-]*[0-9.]+cms)：匹配您的文字并在第 1 组中捕获

如何将某些内容附加到正则表达式匹配的开头？

How to append something to the beginning of Regex matches?

python

regex

url

regexp-replace