如何用正则表达式同时搜索两个可能的引号？

Question

如果引号中的单词是一两个单词长，我想将它们提取出来。这适用于以下代码。

mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."', 
               "Just because I don’t 'care' doesn’t mean I don’t understand."]
quotation = []
rx = r'"((?:\w+[ .]*){1,2})"' 
for sentence in mysentences:
    quotation.append(re.findall(rx, sentence))
print(quotation)

但这并没有让我从第二个句子中得到 'care'，因为第二个句子用双引号引起来。我可以通过以下方式获取它

r"'((?:\w+[ .]*){1,2})'"

问题是，加盟条件如何？

rx = r'"((?:\w+[ .]*){1,2})"' or r"'((?:\w+[ .]*){1,2})'"

它只会让我得到第一个提到的条件。

Answer 1

使用您当前的模式，您可以使用 capturing group 和反向引用 </code> 来匹配随附的单引号或双引号。</p> <p>比赛现在将在第二个捕获组中。</p> <pre><code>(['"])((?:\w+[ .]*){1,2})

Regex demo

请注意，重复字符 class [ .]* 也可能匹配 never try... ....

如果你想匹配 1 或 2 个单词，最后可以有一个可选的点，你可以匹配 1+ 个单词字符后跟一个可选组来匹配 1+ 个空格和 1+ 个单词字符后跟一个可选的点。

(['"])(\w+(?: +\w+)?\.?)

Regex demo

例如

import re
mysentences = ['Kids, you "tried" your "best" and you failed miserably. The "lesson" is, "never try."',
               "Just because I don’t 'care' doesn’t mean I don’t understand."]
quotation = []
rx = r"(['\"])((?:\w+[ .]*){1,2})"
for sentence in mysentences:
    for m in  re.findall(rx, sentence):
        quotation.append(m[1])

print(quotation)

结果

['tried', 'best', 'lesson', 'never try.', 'care']

如何用正则表达式同时搜索两个可能的引号？

How to simultaniously search for two possible quotation marks with regular expressions?

python

regex

quotation-marks