Python re 表达式返回整个字符串，但组不提供整个字符串

Question

我正在尝试在 python 中创建一个正则表达式，它将匹配用户输入字符串的某些元素。到目前为止，即 re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd).

当user_cmd = ' 12 0b110110 \' \' " " "str" \'str\'', re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd) returns <re.Match object; span=(0, 32), match=' 12 0b110110 \' \' " " "str" \'str\''> 是整个字符串所以，因为一切都匹配，正则表达式中的一切都在括号中，一切应该在一个群里吧？原来不是因为re.match("( 0b[10]+| [0-9]+| '.+?'| \".+?\")+", user_cmd).groups()returns(" 'str'",)（只有一项）。为什么是这样？如何在组命令中使正则表达式 return 每个项目都应该 return？

Answer 1

对于你的问题，我建议使用findall方法。

user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
matchs = re.findall(r" 0b[10]+| [0-9]+| '.+?'| \".+?\"+", user_cmd)

print(matchs)

我的结果：

[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]

希望对你有所帮助:)

Answer 2

您的模式正在重复捕获的组，它将捕获组 1 中最后一次迭代的值，即 'str'

对于你的比赛，如果你想要单独的比赛，你不需要重复一个组，如果你只想要比赛，你不需要捕获组。

当所有以 space 开头的部分匹配 space 并使用交替 |.

的非捕获组时，您可能会做的事情

您可以使用否定字符 class 而不是非贪婪量词 .+? 来减少回溯。

 (?:0b[10]+|[0-9]+|'[^']+'|"[^"]+")

(?: 匹配一个 space 并为交替 | 启动一个非捕获组
- 0b[10]+ 匹配 0b 和 1+ 次出现的 1 或 0
- | 或
- [0-9]+匹配1+个数字0-9
- | 或
- '[^']+' 使用取反字符 class 从 ' 到 ' 进行匹配，它将匹配除 '[=70= 之外的任何字符 1+ 次]
- | 或
- "[^"]+" 使用另一个否定字符 class
)关闭非捕获组

Regex demo | Python demo

例如获取所有匹配项 re.findall 获取所有匹配项：

import re
 
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
pattern = r" (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\")"
 
print(re.findall(pattern, user_cmd))

输出

[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]

如果你想要完全匹配，你可以使用 PyPi regex 模块

的 captures()

import regex

pattern = r"""( (?:0b[10]+|[0-9]+|'[^']+'|\"[^\"]+\"))+"""
user_cmd = ' 12 0b110110 \' \' " " "str" \'str\''
m = regex.match(pattern, user_cmd)
print(m.captures(1))

输出

[' 12', ' 0b110110', " ' '", ' " "', ' "str"', " 'str'"]

Python demo

Python re 表达式返回整个字符串，但组不提供整个字符串

Python re expression returning whole string, but groups not providing whole string

python

regex

python-re