在字符串中查找连续的大写单词,包括撇号

Find consecutive capitalized words in a string, including apostrophes

我正在使用正则表达式查找全部大写的连续单词实例,其中一些连续单词包含撇号,即(“The mother-daughter bakery, Molly's Munchies, was founded in 2009”) .我已经写了几行代码来做到这一点:

string = "The mother-daughter bakery, Molly’s Munchies, was founded in 2009"
test = re.findall("([A-Z][a-z]+(?=\s[A-Z])(?:\s[A-Z][a-z]+)+)", string)
print(test)

问题是我无法打印结果('Molly's Munchies')

相反,我的输出是:

('[]')

期望的输出:

("Molly's Munchies")

感谢任何帮助,谢谢!

您可以在 python:

中使用此正则表达式
r"\b[A-Z][a-z'’]*(?:\s+[A-Z][a-z'’]*)+"

RegEx Demo

正则表达式详细信息:

  • \b: 单词匹配
  • [A-Z]:匹配一个大写字母
  • [a-z'’]*:匹配0个或多个包含小写字母或'
  • 的字符
  • (?:\s+[A-Z][a-z'’]*)+匹配1个或多个这样的大写字母词

您需要在定义“单词”的两个地方添加它。您只在一个地方添加了它。

string = "The Cow goes moo, and the Dog's Name is orange"
# e.g. both                here                    and here
#                           v                           v
print(re.findall("([A-Z][a-z']+(?=\s[A-Z])(?:\s[A-Z][a-z']+)+)", string))
['The Cow', "Dog's Name"]