正则表达式 - 如何 select 一个包含 '-' 的单词？

Question

我正在学习正则表达式，很抱歉问了一个简单的问题。

我想select其中有'-'（减号）但不是在开头也不是在单词结尾的单词

我试过了（使用 findall）：

r'\b-\b'

对于

str = 'word semi-column peace'

但是，当然只有：

['-']

谢谢！

Answer 1

str is a built in name, better not to use it for naming

st = 'word semi-column peace'
# \w+ word - \w+ word after - 
print(re.findall(r"\b\w+-\w+\b",st))

['semi-column']

Answer 2

您真正想要做的是像这样的正则表达式：

\w+-\w+

这意味着至少找到一个字母数字字符一次，如使用“+”所示，然后找到一个“-”，然后是另一个字母数字字符至少一次，再次如“+”所示' 再次。

Answer 3

您可以尝试这样的操作：以连字符为中心，我匹配直到从连字符的任一方向出现白色 space 我还检查单词是否被连字符包围（例如-test-cats-)，如果是，我确保不包括它们。正则表达式也应该与 findall 一起使用。

st = 'word semi-column peace'
m = re.search(r'([^ | ^-]+-[^ | ^-]+)', st)
if m:
    print m.group(1)

Answer 4

a '-' (minus sign) in it but not at the beginning and not at the end of the word

由于 "-" 不是单词字符，您不能使用单词边界 (\b) 来防止匹配带有连字符的单词开始或结束。像 "-not-wanted-" 这样的字符串将同时匹配 \b\w+-\w+\b 和 \w+-\w+.

我们需要在单词前后添加一个额外的条件：

之前：(?<![-\w]) 前面既没有连字符也没有单词字符。
之后：(?![-\w]) 后面既没有连字符也没有单词字符。

此外，一个单词中可能有超过 1 个连字符，我们需要允许它。我们在这里可以做的是重复单词的最后一部分 ("hyphen and word characters") 一次或多次：

\w+(?:-\w+)+ 匹配：
- \w+一个或多个单词字符
- (?:-\w+)+一个连字符和一个或多个单词字符，还允许重复最后一部分。

正则表达式：

(?<![-\w])\w+(?:-\w+)+(?![-\w])

regex101 demo

代码：

import re

pattern = re.compile(r'(?<![-\w])\w+(?:-\w+)+(?![-\w])')
text = "-abc word semi-column peace -not-wanted- one-word dont-match- multi-hyphenated-word"

result = re.findall(pattern, text)

ideone demo

Answer 5

您还可以使用以下正则表达式：

>>> st = "word semi-column peace"
>>> print re.findall(r"\S+\-\S+", st)
['semi-column']

正则表达式 - 如何 select 一个包含 '-' 的单词？

regex - how to select a word that has a '-' in it?

python

regex

findall