如何为 shlex.split 使用特定的停止字符?
How to use a specific stop-character for shlex.split?
如何告诉shlex
,如果找到字符;
,那么,不要再分割任何东西了?
示例:
shlex.split("""hello "column number 2" foo ; bar baz""")
应该给
["hello", "column number 2", "foo", "; bar baz"]
而不是 ["hello", "column number 2", "foo", ";", "bar", "baz"]
。
更一般地说,有没有办法用 shlex
定义“注释”分隔符?即
shlex.split("""hello "column number 2" foo ;this is a comment; "last one" bye """)
应该给
["hello", "column number 2", "foo", ";this is a comment;", "last one", "bye"]
shlex
解析器提供了一个用于指定注释字符的选项,但它在简化的 shlex.split
界面中不可用。示例:
import shlex
a = 'hello "bla bla" ; this is a comment'
lex = shlex.shlex(a, posix=True)
lex.commenters = ';'
print(list(lex)) # ['hello', 'bla bla']
这里是一个稍微扩展的split
函数,主要是从Python标准库复制而来,对comments
参数稍作修改,允许指定注释字符:
import shlex
def shlex_split(s, comments='', posix=True):
"""Split the string *s* using shell-like syntax."""
if s is None:
import warnings
warnings.warn("Passing None for 's' to shlex.split() is deprecated.",
DeprecationWarning, stacklevel=2)
lex = shlex.shlex(s, posix=posix)
lex.whitespace_split = True
if isinstance(comments, str):
lex.commenters = comments
elif not comments:
lex.commenters = ''
return list(lex)
您可能想更改上面代码中 comments
的默认值;如所写,它具有与 shlex.split
相同的默认值,即根本不识别注释。 (由 shlex.shlex
创建的解析器对象默认使用 #
作为注释字符,如果您指定 comments=True
就会得到这个字符。为了兼容性,我保留了此行为。)
注意评论会被忽略;它们根本不会出现在结果向量中。当解析器遇到注释字符时,它就停止解析。 (所以永远不会有两个评论。) comments
字符串是可能的评论字符列表,而不是评论序列。因此,如果您想将 #
和 ;
都识别为注释字符,请指定 comments='#:'
.
这是一个示例 运行:
>>> # Default behaviour is the same as shlex.split
>>> shlex_split("""hello "column number 2" foo ; bar baz""")
['hello', 'column number 2', 'foo', ';', 'bar', 'baz']
>>> # Supply a comments parameter to specify a comment character
>>> shlex_split("""hello "column number 2" foo ; bar baz""", comments=';')
['hello', 'column number 2', 'foo']
>>> shlex_split("""hello "column number 2" foo ;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']
>>> # The ; is recognised as a comment even if it is not preceded by whitespace.
>>> shlex_split("""hello "column number 2" foo;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']
如何告诉shlex
,如果找到字符;
,那么,不要再分割任何东西了?
示例:
shlex.split("""hello "column number 2" foo ; bar baz""")
应该给
["hello", "column number 2", "foo", "; bar baz"]
而不是 ["hello", "column number 2", "foo", ";", "bar", "baz"]
。
更一般地说,有没有办法用 shlex
定义“注释”分隔符?即
shlex.split("""hello "column number 2" foo ;this is a comment; "last one" bye """)
应该给
["hello", "column number 2", "foo", ";this is a comment;", "last one", "bye"]
shlex
解析器提供了一个用于指定注释字符的选项,但它在简化的 shlex.split
界面中不可用。示例:
import shlex
a = 'hello "bla bla" ; this is a comment'
lex = shlex.shlex(a, posix=True)
lex.commenters = ';'
print(list(lex)) # ['hello', 'bla bla']
这里是一个稍微扩展的split
函数,主要是从Python标准库复制而来,对comments
参数稍作修改,允许指定注释字符:
import shlex
def shlex_split(s, comments='', posix=True):
"""Split the string *s* using shell-like syntax."""
if s is None:
import warnings
warnings.warn("Passing None for 's' to shlex.split() is deprecated.",
DeprecationWarning, stacklevel=2)
lex = shlex.shlex(s, posix=posix)
lex.whitespace_split = True
if isinstance(comments, str):
lex.commenters = comments
elif not comments:
lex.commenters = ''
return list(lex)
您可能想更改上面代码中 comments
的默认值;如所写,它具有与 shlex.split
相同的默认值,即根本不识别注释。 (由 shlex.shlex
创建的解析器对象默认使用 #
作为注释字符,如果您指定 comments=True
就会得到这个字符。为了兼容性,我保留了此行为。)
注意评论会被忽略;它们根本不会出现在结果向量中。当解析器遇到注释字符时,它就停止解析。 (所以永远不会有两个评论。) comments
字符串是可能的评论字符列表,而不是评论序列。因此,如果您想将 #
和 ;
都识别为注释字符,请指定 comments='#:'
.
这是一个示例 运行:
>>> # Default behaviour is the same as shlex.split
>>> shlex_split("""hello "column number 2" foo ; bar baz""")
['hello', 'column number 2', 'foo', ';', 'bar', 'baz']
>>> # Supply a comments parameter to specify a comment character
>>> shlex_split("""hello "column number 2" foo ; bar baz""", comments=';')
['hello', 'column number 2', 'foo']
>>> shlex_split("""hello "column number 2" foo ;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']
>>> # The ; is recognised as a comment even if it is not preceded by whitespace.
>>> shlex_split("""hello "column number 2" foo;this is a comment; "last one" bye """, comments=';')
['hello', 'column number 2', 'foo']