pyparsing delimitedList(..., combine=True) 给出不一致的结果
pyparsing delimitedList(..., combine=True) giving inconsistent result
我在 Python 3.4 中使用 pyparsing==2.1.5,我得到了一个奇怪的结果:
word = Word(alphanums)
word_list_no_combine = delimitedList(word, combine=False)
word_list_combine = delimitedList(word, combine=True)
print(word_list_no_combine.parseString('one, two')) # ['one', 'two']
print(word_list_no_combine.parseString('one,two')) # ['one', 'two']
print(word_list_combine.parseString('one, two')) # ['one']: ODD ONE OUT
print(word_list_combine.parseString('one,two')) # ['one,two']
我不明白为什么 "combine" 选项会导致列表的一部分在 space 存在时被吞没,但在 space 不存在时却不会。这是一个 pyparsing 错误还是我遗漏了一些明显的东西?
看起来是由于 Combine() 的行为,特别是它的默认 "adjacent=True" 选项,然后由 delimitedList() 使用:
class Combine(TokenConverter):
"""Converter to concatenate all matching tokens to a single string.
By default, the matching patterns must also be contiguous in the input string;
this can be disabled by specifying C{'adjacent=False'} in the constructor.
"""
def __init__( self, expr, joinString="", adjacent=True ):
# ...
def delimitedList( expr, delim=",", combine=False ):
# ...
dlName = _ustr(expr)+" ["+_ustr(delim)+" "+_ustr(expr)+"]..."
if combine:
return Combine( expr + ZeroOrMore( delim + expr ) ).setName(dlName)
else:
return ( expr + ZeroOrMore( Suppress( delim ) + expr ) ).setName(dlName)
所以可以通过替换来解决:
def delimitedListPlus(expr, delim=",", combine=False, combine_adjacent=False):
dlName = str(expr) + " [" + str(delim) + " " + str(expr) + "]..."
if combine:
return Combine(expr + ZeroOrMore(delim + expr),
adjacent=combine_adjacent).setName(dlName)
else:
return (expr + ZeroOrMore(Suppress(delim) + expr)).setName(dlName)
与其修改 pyparsing,我建议您使用带有自定义解析操作的普通未组合定界列表来完成这项工作:
word_list_combine_using_parse_action = word_list_no_combine.copy().setParseAction(','.join)
print(word_list_combine_using_parse_action.parseString('one, two'))
将打印 one,two
我在 Python 3.4 中使用 pyparsing==2.1.5,我得到了一个奇怪的结果:
word = Word(alphanums)
word_list_no_combine = delimitedList(word, combine=False)
word_list_combine = delimitedList(word, combine=True)
print(word_list_no_combine.parseString('one, two')) # ['one', 'two']
print(word_list_no_combine.parseString('one,two')) # ['one', 'two']
print(word_list_combine.parseString('one, two')) # ['one']: ODD ONE OUT
print(word_list_combine.parseString('one,two')) # ['one,two']
我不明白为什么 "combine" 选项会导致列表的一部分在 space 存在时被吞没,但在 space 不存在时却不会。这是一个 pyparsing 错误还是我遗漏了一些明显的东西?
看起来是由于 Combine() 的行为,特别是它的默认 "adjacent=True" 选项,然后由 delimitedList() 使用:
class Combine(TokenConverter):
"""Converter to concatenate all matching tokens to a single string.
By default, the matching patterns must also be contiguous in the input string;
this can be disabled by specifying C{'adjacent=False'} in the constructor.
"""
def __init__( self, expr, joinString="", adjacent=True ):
# ...
def delimitedList( expr, delim=",", combine=False ):
# ...
dlName = _ustr(expr)+" ["+_ustr(delim)+" "+_ustr(expr)+"]..."
if combine:
return Combine( expr + ZeroOrMore( delim + expr ) ).setName(dlName)
else:
return ( expr + ZeroOrMore( Suppress( delim ) + expr ) ).setName(dlName)
所以可以通过替换来解决:
def delimitedListPlus(expr, delim=",", combine=False, combine_adjacent=False):
dlName = str(expr) + " [" + str(delim) + " " + str(expr) + "]..."
if combine:
return Combine(expr + ZeroOrMore(delim + expr),
adjacent=combine_adjacent).setName(dlName)
else:
return (expr + ZeroOrMore(Suppress(delim) + expr)).setName(dlName)
与其修改 pyparsing,我建议您使用带有自定义解析操作的普通未组合定界列表来完成这项工作:
word_list_combine_using_parse_action = word_list_no_combine.copy().setParseAction(','.join)
print(word_list_combine_using_parse_action.parseString('one, two'))
将打印 one,two