pandas python 正则表达式查找所有以 ' 开头、结尾或包含 '

Question

我想找出所有以'.'开头或结尾或包含'.

的单词、数字

我尝试编写如下 2 个正则表达式。对于第二个，我添加了 ?: 来表示单词末尾或单词开头的文本是可选的。但没有得到所需的结果。你我做错了什么？我想找到 I've, 'had, not', you're, 123'45 - 基本上所有有 '

import re
xyz="I've never 'had somebody [redacted-number] [redacted-number] [redacted-number] not. not' you're  123'45"


print (re.findall("\w+\'\w+", xyz))
print (re.findall("(?:\w+)\'(?:\w+)", xyz))

["I've", "you're", "123'45"]
["I've", "you're", "123'45"]

Answer 1

您可以使用

\w*(?!\B'\B)'\w*
\w+'\w*|'\w+

参见regex demo #1 / regex demo #2。

详情

\w*(?!\B'\B)'\w* - 零个或多个单词字符，一个 ' 字符（前后没有 non-word 个字符或字符串的 start/end），零个或更多单词字符
\w+'\w*|'\w+ - 一个或多个单词字符，'，零个或多个单词字符，或一个 ' 字符，然后是一个或多个单词字符。

查看 Python demo:

import re
xyz="I've never 'had somebody [redacted-number] [redacted-number] [redacted-number] not. not' you're  123'45"
print (re.findall(r"\w*(?!\B'\B)'\w*", xyz))
# => ["I've", "'had", "not'", "you're", "123'45"]

在Pandas中，可以使用Series.str.findall:

df['result'] = df['source'].str.findall(r"\w*(?!\B'\B)'\w*")

Answer 2

您想捕获所有包含 ' 的单词，不是吗？试试这个：

re.findall("\w*'\w*", xyz)

这将查找前面或后面有 0 个或多个单词字符的任何 ' 个字符。它匹配示例字符串中的所有必需单词。您的尝试使用了 \w+，它在 ' 前后至少需要一个单词字符。这就是为什么它不匹配 'had 和 not'.

阅读其他答案后，我会说 Wiktor 的答案是最好的。用那个。

Answer 3

你快到了。试试这个：

(?:\w+)?'(?:\w+)?

(?:\w+) => ?:确保非捕获组，\w+匹配单词字符1到无限次。 ? 确保在 0 到 1 次之间匹配前一个标记。

https://regex101.com/r/N8Y9cQ/1

pandas python 正则表达式查找所有以 ' 开头、结尾或包含 '

pandas python regex find all words that begin, end or contain '

python

regex

pandas