如何 return 部分子串匹配 python 中的完整子串作为列表？

Question

我有不同长度的字符串，必须检查匹配“tion”、“ex”、“ph”、“ost”、“ast”、“ist”模式的子字符串，忽略大小写和位置即 prefix/suffix/middle 个单词。必须在新列表中 return 编辑匹配词，而不是单独匹配子字符串元素。使用下面的代码，我可以 return 一个没有完整匹配词的匹配子字符串元素的新列表。

def latin_ish_words(text):
    import re
    pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
    matches=pattern.findall(text)
    return matches
latin_ish_words("This functions as expected")

结果如下：['tion', 'ex']

我想知道如何将整个单词而不是匹配的子字符串元素 return 放入新列表中？

Answer 1

您可以使用

pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
pattern=re.compile(r"[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*")
pattern=re.compile(r"[^\W\d_]*?(?:tion|ex|ph|ost|ast|ist)[^\W\d_]*")

正则表达式（参见 the regex demo）匹配

\w*? - 零个或多个但尽可能少的单词字符
(?:tion|ex|ph|ost|ast|ist) - 字符串之一
\w* - 零个或多个但尽可能多的单词字符

[a-zA-Z] 部分将仅匹配 ASCII 字母，[^\W\d_] 将匹配任何 Unicode 字母。

注意在 re.findall 中使用非捕获组，否则，捕获的子字符串也会进入输出列表。

如果你只需要匹配字母单词，而需要将它们作为整个单词进行匹配，添加word boundaries，r"\b[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*\b"。

见Python demo:

import re
def latin_ish_words(text):
    import re
    pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
    return pattern.findall(text)
 
print(latin_ish_words("This functions as expected"))
# => ['functions', 'expected']

Answer 2

忽略大小写

pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
matches=pattern.findall(text)

不会那样做，请考虑以下示例

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

[]

尽管应该有 EX，但您应该像这样添加 re.IGNORECASE 标志

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist", re.IGNORECASE)
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

['EX']

Answer 3

对于带有空白边界的不区分大小写的匹配，您可以使用：

(?i)(?<!\S)\w*(?:tion|ex|ph|[oia]st)\w*(?!\S)

模式匹配：

(?i) 不区分大小写匹配的内联修饰符（或使用 re.I）
(?<!\S) 断言左侧空白边界
\w*匹配可选单词字符
(?:非捕获组
- tion|ex|ph|[oia]st 使用字符 [=57= 匹配 tion ex php 或 ost ist ast ]
)关闭非捕获组
\w*匹配可选单词字符
(?!\S) 断言右边的空白边界

Regex demo | Python demo

def latin_ish_words(text):
    import re
    pattern = r"(?i)(?<!\S)\w*(?:tion|ex|ph|[oia]st)\w*(?!\S)"
    return re.findall(pattern, text)

print(latin_ish_words("This functions as expected"))

输出

['functions', 'expected']

如何 return 部分子串匹配 python 中的完整子串作为列表？

How to return full substring from partial substring match in python as a list?

python

regex

string

substring

findall