如何仅在最后一次出现关键字时才匹配表达式语法

Question

我想编写一个匹配字符串的表达式语法：

words at the start ONE|ANOTHER wordAtTheEnd

---------^-------- ----^----- --^--
     A: alphas     B: choice  C: alphas

然而，问题是 A 部分可以包含 B 部分中的关键字 "ONE" 或 "ANOTHER" ，因此只有 last 的选择关键字匹配部分 B。这里有一个例子：字符串

ZERO ONE or TWO are numbers ANOTHER letsendhere

应该解析成字段

A: "ZERO ONE or TWO are numbers"
B: "ANOTHER"
C: "letsendhere"

使用 pyparsing 我尝试了 OneorMore 表达式的“stopOn”关键字：

choice = pp.Or([pp.Keyword("ONE"), pp.Keyword("OTHER")])('B')
start = pp.OneOrMore(pp.Word(pp.alphas), stopOn=choice)('A')
end = pp.Word(pp.alphas)('C')
expr = (start + choice) + end

但这不起作用。对于示例字符串，我得到 ParseException:

Expected end of text (at char 12), (line:1, col:13)
"ZERO ONE or >!<TWO are numbers ANOTHER text"

这是有道理的，因为 stopOn 在 choice 的 第一次 出现时停止，而不是在 最后一次 出现时停止.我怎样才能写一个在最后一次出现时停止的语法呢？也许我需要诉诸 context-sensitive grammar?

Answer 1

有时你必须尝试"be the parser"。 "last occurrence of X" 与其他 X 的区别是什么？一种说法是 "an X that is not followed by any more X's"。使用 pyparsing，您可以编写这样的辅助方法：

def last_occurrence_of(expr):
    return expr + ~FollowedBy(SkipTo(expr))

此处用作 OneOrMore 的 stopOn 参数：

integer = Word(nums)
word = Word(alphas)
list_of_words_and_ints = OneOrMore(integer | word, stopOn=last_occurrence_of(integer)) + integer

print(list_of_words_and_ints.parseString("sldkfj 123 sdlkjff 123 lklj lkj 2344 234 lkj lkjj"))

打印：

['sldkfj', '123', 'sdlkjff', '123', 'lklj', 'lkj', '2344', '234']

如何仅在最后一次出现关键字时才匹配表达式语法

How to match with a expression grammar only the last time a keyword occurs occurs

python

parsing

pyparsing

context-free-grammar

context-sensitive-grammar