如何为 shlex.split 使用特定的停止字符？

Question

如何告诉shlex，如果找到字符;，那么，不要再分割任何东西了？

示例：

shlex.split("""hello "column number 2" foo ; bar baz""")

应该给

["hello", "column number 2", "foo", "; bar baz"]

而不是 ["hello", "column number 2", "foo", ";", "bar", "baz"]。

更一般地说，有没有办法用 shlex 定义“注释”分隔符？即

shlex.split("""hello "column number 2" foo ;this is a comment; "last one" bye """)

应该给

["hello", "column number 2", "foo", ";this is a comment;", "last one", "bye"]

Answer 1

shlex 解析器提供了一个用于指定注释字符的选项，但它在简化的 shlex.split 界面中不可用。示例：

import shlex

a = 'hello "bla bla" ; this is a comment'

lex = shlex.shlex(a, posix=True)
lex.commenters = ';'
print(list(lex))  # ['hello', 'bla bla']

这里是一个稍微扩展的split函数，主要是从Python标准库复制而来，对comments参数稍作修改，允许指定注释字符：

import shlex
def shlex_split(s, comments='', posix=True):
    """Split the string *s* using shell-like syntax."""
    if s is None:
        import warnings
        warnings.warn("Passing None for 's' to shlex.split() is deprecated.",
                      DeprecationWarning, stacklevel=2)
    lex = shlex.shlex(s, posix=posix)
    lex.whitespace_split = True
    if isinstance(comments, str):
        lex.commenters = comments
    elif not comments:
        lex.commenters = ''
    return list(lex)

您可能想更改上面代码中 comments 的默认值；如所写，它具有与 shlex.split 相同的默认值，即根本不识别注释。（由 shlex.shlex 创建的解析器对象默认使用 # 作为注释字符，如果您指定 comments=True 就会得到这个字符。为了兼容性，我保留了此行为。）

注意评论会被忽略；它们根本不会出现在结果向量中。当解析器遇到注释字符时，它就停止解析。（所以永远不会有两个评论。） comments 字符串是可能的评论字符列表，而不是评论序列。因此，如果您想将 # 和 ; 都识别为注释字符，请指定 comments='#:'.

这是一个示例运行:

>>> # Default behaviour is the same as shlex.split
>>> shlex_split("""hello "column number 2" foo ; bar baz""") 
['hello', 'column number 2', 'foo', ';', 'bar', 'baz']
>>> # Supply a comments parameter to specify a comment character 
>>> shlex_split("""hello "column number 2" foo ; bar baz""", comments=';') 
['hello', 'column number 2', 'foo']
>>> shlex_split("""hello "column number 2" foo ;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']
>>> # The ; is recognised as a comment even if it is not preceded by whitespace.
>>> shlex_split("""hello "column number 2" foo;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']

如何为 shlex.split 使用特定的停止字符？

How to use a specific stop-character for shlex.split?

python

shlex