如何使用正则表达式提取参数？

Question

我想使用正则表达式提取参数（命令行参数的类型）。在这里，我将字符串作为输入，并将参数作为组

基本上我希望正则表达式中的集合既排除又包含一些字符。

import re

ppatt=r"( --(?P<param>([^( --)]*)))"
a=[x.group("param") for x in re.finditer(ppatt,"command --m=psrmcc;ld -  --kkk gtodf --klfj")]
print(a)

我希望输出为

['m=psrmcc;ld - ', 'kkk gtodf', 'klfj']

但输出是

['m=psrmcc;ld', 'kkk', 'klfj']

Answer 1

您可以使用re.split

例如：

import re

print(re.split(r"--", "command --m=psrmcc;ld -  --kkk gtodf --klfj")[1:])
#or
print("command --m=psrmcc;ld -  --kkk gtodf --klfj".split("--")[1:])

输出：

['m=psrmcc;ld -  ', 'kkk gtodf ', 'klfj']

Answer 2

我们可以使用带有单词边界的字符列表来解决这个问题，也许使用类似于以下的表达式：

(?:.+?)(\b[A-Za-z=;\s]+\b)

如果我们希望有更多的字符，我们将把它添加到：

[A-Za-z=;\s]

在这里，我们没有使用 non-capturing 组捕获不需要的字符：

(?:.+?)

然后我们收集包裹在捕获组中的所需字符，我们可以简单地使用 </code>:</p> 来调用它 <pre><code>(\b[A-Za-z=;\s]+\b)

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:.+?)(\b[A-Za-z=;\s]+\b)"

test_str = "command --m=psrmcc;ld -  --kkk gtodf --klfj"

subst = "\1\n"

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

正则表达式电路

jex.im 可视化正则表达式：

如何使用正则表达式提取参数？

How to extract arguments using regex?

regex

regex-group

python-3.x

regex-greedy

regex-lookarounds

测试

正则表达式电路

DEMO