如何将字符串拆分为命令行参数,如 python 中的 shell?

How to split a string into command line arguments like the shell in python?

我在字符串中有命令行参数,我需要将其拆分以提供给 argparse.ArgumentParser.parse_args

我看到 the documentation 大量使用 string.split()。但是在复杂的情况下,这不起作用,例如

--foo "spaces in brakets"  --bar escaped\ spaces

python 中是否有这样做的功能?

(有人问了 java 的类似问题 )。

您可以使用 split_arg_string helper function from the click 包:

import re

def split_arg_string(string):
    """Given an argument string this attempts to split it into small parts."""
    rv = []
    for match in re.finditer(r"('([^'\]*(?:\.[^'\]*)*)'"
                             r'|"([^"\]*(?:\.[^"\]*)*)"'
                             r'|\S+)\s*', string, re.S):
        arg = match.group().strip()
        if arg[:1] == arg[-1:] and arg[:1] in '"\'':
            arg = arg[1:-1].encode('ascii', 'backslashreplace') \
                .decode('unicode-escape')
        try:
            arg = type(string)(arg)
        except UnicodeError:
            pass
        rv.append(arg)
    return rv

例如:

>>> print split_arg_string('"this is a test" 1 2 "1 \" 2"')
['this is a test', '1', '2', '1 " 2']

click 包开始主导命令参数解析,但我认为它不支持解析字符串参数(仅来自 argv)。上面的辅助函数仅用于 bash 完成。

编辑:我只能推荐使用@ShadowRanger 回答中建议的shlex.split()。我不删除这个答案的唯一原因是因为它提供 shlex 中使用的成熟的纯 python 分词器快一点 (上面的例子快了大约 3.5 倍,分别是 5.9us 和 20.5us)。但是,这不应该成为比 shlex.

更喜欢它的理由

这就是shlex.split was created for

如果您正在解析 windows 风格的命令行,那么 shlex.split 将无法正常工作 - 在结果上调用 subprocess 函数将不会有与将字符串直接传递给 shell.

在这种情况下,像 python 的命令行参数一样拆分字符串的最可靠方法是...将命令行参数传递给 python:

import sys
import subprocess
import shlex
import json  # json is an easy way to send arbitrary ascii-safe lists of strings out of python

def shell_split(cmd):
    """
    Like `shlex.split`, but uses the Windows splitting syntax when run on Windows.

    On windows, this is the inverse of subprocess.list2cmdline
    """
    if os.name == 'posix':
        return shlex.split(cmd)
    else:
        # TODO: write a version of this that doesn't invoke a subprocess
        if not cmd:
            return []
        full_cmd = '{} {}'.format(
            subprocess.list2cmdline([
                sys.executable, '-c',
                'import sys, json; print(json.dumps(sys.argv[1:]))'
            ]), cmd
        )
        ret = subprocess.check_output(full_cmd).decode()
        return json.loads(ret)

它们有何不同的一个例子:

# windows does not treat all backslashes as escapes
>>> shell_split(r'C:\Users\me\some_file.txt "file with spaces"', 'file with spaces')
['C:\Users\me\some_file.txt', 'file with spaces']

# posix does
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"')
['C:Usersmesome_file.txt', 'file with spaces']

# non-posix does not mean Windows - this produces extra quotes
>>> shlex.split(r'C:\Users\me\some_file.txt "file with spaces"', posix=False)
['C:\Users\me\some_file.txt', '"file with spaces"']