与pyparsing匹配的简单嵌套表达式

Question

我想匹配一个看起来像这样的表达式：

(<some value with spaces and m24any crazy signs> (<more values>) <even more>)

我只是想沿着圆括号 () 拆分这些值。目前，我可以减少 s 表达式示例中的 pyparsing 开销which is far to extensive and not understandable（恕我直言）。

我使用了 nestedExpr 语句，将其缩减为一行：

import pyparsing as pp
parser = pp.nestedExpr(opener='(', closer=')')
print parser.parseString(example, parseAll=True).asList()

结果似乎也在空白处被分割，这是我不想要的：

  skewed_output = [['<some',
  'value',
  'with',
  'spaces',
  'and',
  'm24any',
  'crazy',
  'signs>',
  ['<more', 'values>'],
  '<even',
  'more>']]
expected_output = [['<some value with spaces and m24any crazy signs>' 
['<more values>'], '<even more>']]
best_output = [['some value with spaces and m24any crazy signs' 
['more vlaues'], 'even more']]

可选地，我很乐意将任何要点带到我可以阅读一些可理解的介绍的地方，作为如何包含更详细的解析器（我想提取之间的值< > 括号并匹配它们（参见 best_output），但之后我总是可以 string.strip() 它们。

提前致谢！

Answer 1

Pyparsing 的 nestedExpr 接受 content 和 ignoreExpr 参数，它们指定什么是 "single item" 或 s-expr。你可以在这里传递QuotedString。不幸的是，我没有很好地理解 docs 中两个参数之间的区别，但是一些实验表明以下代码应该满足您的要求：

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")",
                       content=single_value,
                       ignoreExpr=None)

example = "(<some value with spaces and m24any crazy signs> (<more values>) <even more>)"
print(parser.parseString(example, parseAll=True))

输出：

[['some value with spaces and m24any crazy signs', ['more values'], 'even more']]

它期望列表以 ( 开始，以 ) 结束，并包含一些 optionally-whitespace-separated 列表或带引号的字符串，每个带引号的字符串应以 < 开头，以>结尾，里面不包含<。

您可以更多地使用 content 和 ignoreExpr 参数来发现 content=None, ignoreExpr=single_value 使解析接受带引号和不带引号的字符串（并用空格分隔未带引号的字符串）：

import pyparsing as pp

single_value = pp.QuotedString(quoteChar="<", endQuoteChar=">")
parser = pp.nestedExpr(opener="(", closer=")", ignoreExpr=single_value, content=None)

example = "(<some value with spaces and m24any crazy signs> (<more values>) <even m<<ore> foo (foo) <(foo)>)"
print(parser.parseString(example, parseAll=True))

输出：

[['some value with spaces and m24any crazy signs', ['more values'], 'even m<<ore', 'foo', ['foo'], '(foo)']]

一些未解决的问题：

为什么 pyparsing 忽略连续列表项之间的空格？
content 和 ignoreExpr 有什么区别，什么时候应该使用它们？

与pyparsing匹配的简单嵌套表达式

Simple nested expression matching with pyparsing

python

parsing

pyparsing