解析时引用令牌值

Question

我正在尝试解析以下内容：

<delimiter><text><delimiter><text><delimter>

其中 delimiter 可以是重复三次的任何单个文字字符，text 可以是分隔符 旁边的任何可打印字符（ [ 的第一次和第二次出现=14=] do not 必须匹配并且可以为空)。

这是我想出的，但是 text 从第一个分隔符到字符串的末尾消耗。

from pyparsing import Word, printables

delimiter = Word(printables, exact=1)
text = (Word(printables) + ~delimiter)

parser = delimiter + text  # + delimiter + text + delimiter

tests = [
    ('_abc_123_', ['_', 'abc', '_', '123', '_']),
    ('-abc-123-', ['-', 'abc', '-', '123', '-']),
    ('___', ['_', '', '_', '', '_']),
]

for test, expected in tests:
    print parser.parseString(test), '<=>', expected

脚本输出：

['_', 'abc_123_'] <=> ['_', 'abc', '_', '123', '_']
['-', 'abc-123-'] <=> ['-', 'abc', '-', '123', '-']
['_', '__'] <=> ['_', '', '_', '', '_']

我想我需要使用 Future 但我可以在解析时从文本标记中排除定界符的值。

Answer 1

您的直觉是正确的，您需要使用 Forward（而不是 Future）来捕获文本的定义，因为这在解析时才完全可知。此外，您对 Word 的使用必须使用 excludeChars 参数排除分隔符 - 仅使用 Word(printables) + ~delimiter 是不够的。

这是您的代码，标记了必要的更改，并希望提供一些有用的评论：

delimiter = Word(printables, exact=1)
text = Forward() #(Word(printables) + ~delimiter)
def setTextExcludingDelimiter(s,l,t):
    # define Word as all printable characters, excluding the delimiter character
    # the excludeChars argument for Word is how this is done
    text_word = Word(printables, excludeChars=t[0]).setName("text")
    # use '<<' operator to assign the text_word definition to the 
    # previously defined text expression
    text << text_word
# attach parse action to delimiter, so that once it is matched, 
# it will define the correct expression for text
delimiter.setParseAction(setTextExcludingDelimiter)

# make the text expressions Optional with default value of '' to satisfy 3rd test case
parser = delimiter + Optional(text,'') + delimiter + Optional(text,'') + delimiter

解析时引用令牌值

Reference token value at parse time

python

parsing

pyparsing