在 nestedExpr 中保留换行符

Question

nestedExpr 是否可以保留换行符？

这是一个简单的例子：

import pyparsing as pp

# Parse expressions like: \name{body}
name = pp.Word( pp.alphas )
body = pp.nestedExpr( '{', '}' )
expr = '\' + name('name') + body('body')

# Example text to parse
txt = '''
This \works{fine}, but \it{
    does not
    preserve newlines
}
'''

# Show results
for e in expr.searchString(txt):
    print 'name: ' + e.name
    print 'body: ' + str(e.body) + '\n'

输出：

name: works
body: [['fine']]

name: it
body: [['does', 'not', 'preserve', 'newlines']]

如您所见，尽管正文中有换行符，但第二个表达式 (\it{ ...) 的正文已被解析，但我希望结果将每一行存储在单独的子数组中。这个结果使得无法区分单行和多行的正文内容。

Answer 1

此扩展（基于 nestedExpr 版本 2.1.10 的代码）的行为更接近我所期望的 "nested expression" 到 return:

import string
from pyparsing import *

defaultWhitechars = string.whitespace
ParserElement.setDefaultWhitespaceChars(defaultWhitechars)

def fencedExpr( opener="(", closer=")", content=None, ignoreExpr=None, stripchars=defaultWhitechars ):

    if content is None:
        if isinstance(opener,basestring) and isinstance(closer,basestring):
            if len(opener) == 1 and len(closer)==1:
                if ignoreExpr is not None:
                    content = Combine(OneOrMore( ~ignoreExpr + CharsNotIn(opener+closer,exact=1)))
                else:
                    content = empty.copy() + CharsNotIn(opener+closer)
            else:
                if ignoreExpr is not None:
                    content = OneOrMore( ~ignoreExpr + ~Literal(opener) + ~Literal(closer))
                else:
                    content = OneOrMore( ~Literal(opener) + ~Literal(closer) )
        else:
            raise ValueError("opening and closing arguments must be strings if no content expression is given")

    if stripchars is not None:
        content.setParseAction(lambda t:t[0].strip(stripchars))

    ret = Forward()
    if ignoreExpr is not None:
        ret <<= Group( Suppress(opener) + ZeroOrMore( ignoreExpr | ret | content ) + Suppress(closer) )
    else:
        ret <<= Group( Suppress(opener) + ZeroOrMore( ret | content )  + Suppress(closer) )
    ret.setName('nested %s%s expression' % (opener,closer))
    return ret

恕我直言，它修复了一些问题：

原来的实现在默认content中使用了ParserElement.DEFAULT_WHITE_CHARS，看来是出于偷懒；它只在 ParserElement class 本身之外使用了五次，其中四次在函数 nestedExpr 中（另一个用法在 LineEnd 中，它手动删除 \n).将命名参数添加到 nestedExpr 会很容易，但公平地说，我们也可以使用 ParserElement.setDefaultWhitespaceChars 来实现相同的目的。
第二个问题是，默认情况下，在 content 表达式本身中忽略空白字符，附加的解析操作 lambda t:t[0].strip()，其中在没有输入的情况下调用 strip，这意味着它 removes all unicode whitespace characters。我个人认为不忽略内容中的任何空白更有意义，而是在结果中有选择地去除它们。出于这个原因，我在原始实现中删除了带有 CharsNotIn 的标记，并引入了默认为 string.whitespace.

stripchars

很高兴接受任何建设性的批评。

Answer 2

几分钟前我才看到你的回答，我已经想到了这个方法：

body = pp.nestedExpr( '{', '}', content = (pp.LineEnd() | name.setWhitespaceChars(' ')))

将 body 更改为此定义可得出以下结果：

name: works
body: [['fine']]

name: it
body: [['\n', 'does', 'not', '\n', 'preserve', 'newlines', '\n']]

编辑：

等等，如果你想要的是单独的行，那么也许这就是你要找的：

single_line = pp.OneOrMore(name.setWhitespaceChars(' ')).setParseAction(' '.join)
multi_line = pp.OneOrMore(pp.Optional(single_line) + pp.LineEnd().suppress())
body = pp.nestedExpr( '{', '}', content = multi_line | single_line )

给出：

name: works
body: [['fine']]

name: it
body: [['does not', 'preserve newlines']]

在 nestedExpr 中保留换行符

Preserve newlines in nestedExpr

python

newline

pyparsing