如何在 python3 中组合两个 re.compile 正则表达式？

Question

我想将两个正则表达式合并为一行。

 soup1=link.findAll('a', attrs={'href': re.compile('^http://')})
 soup2=link.findAll('a', attrs={'href': re.compile("/news/")})

我试过 (|) 以 re.compile('^http://' | '/news/') 的方式签名，但都是徒劳。我需要这两个功能（包含 'http' 和 /news/ 的链接）

Answer 1

试试这个：

re.compile(r'(^http://)|(/news/)')

您试过的几乎是正确的，re.compile('^http://' | '/news/')，只需将它们放在单引号内即可：re.compile('^http://|/news/').

Answer 2

你不需要正则表达式，你可以使用 css 选择器：

 soup.select('a[href^=http://],a[href*=/news/]')

^= 查找以子字符串开头的 href，*= 查找包含子字符串的任何位置的 href。

Answer 3

回答问题：

I want to combine two regex's into one line... I need both functionalities (Links containing 'http' as well as /news/)

我理解以及是字符串中同时存在 http 和 /news/ 的要求。因此，您可以使用简单的

re.compile(r'^http://.*/news/')

它将在开头匹配 http 并匹配字符串中某处的 /news/ 子字符串。

图案详情:

交替使用 http 开始或 /news/里面

| 交替运算符在 正则表达式模式 内使用，而不是在 re.compile:

内的正则表达式模式之间

re.compile(r'^http://|/news/')
                     ^

这里，^只属于http（第一个分支）。 ^http:// 在字符串开头匹配 http:// - 或 - /news 分支匹配字符串内任何位置的 /news/。因此，将匹配所有以 http 开头或字符串内有 /news/ 的值。

Answer 4

这对我有用

nombre = soup.findAll('a',{'href':re.compile('^http |'+'.'+palabra+'.',flags=re.IGNORECASE)})

How to combine two re.compile regex in python3?