Pyparsing - 递归语法

Pyparsing - grammar with recursion

我正在尝试创建将解析以下表达式的语法:

  1. func()

  2. func(a)

  3. func(a) + func(b)

  4. func(func(a) + func()) + func(b)

我为 (1) 和 (2) 实现了它,但是一旦我将 rvalue << (identifier | function_call) 扩展 operation,它就停止工作了,原因是:

Exception raised:Expected W:(ABCD...), found ')'  (at char 5), (line:1, col:6)
Exception raised:maximum recursion depth exceeded

你们谁能解释一下为什么?据我所知,表达式 rvalue << (identifier | function_call | operation) function_call 应该在 operation 之前匹配,并且不应该发生递归。

代码:

from pyparsing import Forward, Optional, Word, Literal, alphanums, delimitedList

rvalue = Forward()

operation = rvalue + Literal('+') + rvalue
identifier = Word(alphanums + '_')('identifier')
function_args = delimitedList(rvalue)('function_args')

function_name = identifier('function_name')
function_call = (
    (function_name + Literal("(") + Optional(function_args) + Literal(")"))
)('function_call')

rvalue << (identifier | function_call | operation)
function_call.setDebug()


def test_function_call_no_args():
    bdict = function_call.parseString("func()", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' not in bdict


def test_function_call_one_arg():
    bdict = function_call.parseString("func(arg)", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' in bdict


def test_function_call_many_args():
    bdict = function_call.parseString("func(arg1, arg2)", parseAll=True).asDict()
    assert bdict['function_name'] == 'func'
    assert 'function_args' in bdict

As far as I understood in expression rvalue << (identifier | function_call | operation) function_call should be matched before operation and the recursion shouldn't take place.

如果前面的备选方案之一成功,则不会发生递归。但是,如果两者都失败,则尝试 operation 并得到无限递归。

例如,在 test_function_call_no_args 中,您尝试使用 function_call 规则解析 func()。这会将 func 解析为函数名称,将 ( 解析为参数列表的开头。然后它将尝试解析 Optional(function_args),后者将依次尝试解析 delimitedList(rvalue)。现在这将尝试解析 rvalue 并且由于 ) 与前两个选项不匹配,它将尝试最后一个,这将导致无限递归。

当规则是递归时,您必须始终在递归到达之前消耗输入 - 必须在不消耗输入的情况下到达递归。因此,让递归在替代方案中排在最后是不够的——实际上必须有另一个非可选规则(也不匹配空字符串)在它之前被成功调用。

PS:rvalue 实际上永远无法匹配函数调用,因为函数调用以标识符开头,您首先匹配 identifier