解析右结合运算符（指数）

Question

我一直在为自己的语言编写 lexer/parser/interpreter，到目前为止一切正常。我一直在关注 Ruslan Spivak's blog (Github link 中每篇文章的示例。

我想扩展我的语言语法，使其超越文章中所写的内容，以包含更多运算符，例如比较运算符（<、>= 等）以及指数运算符（**或我的语言 ^）。我有这个语法：

expression        : exponent ((ADD | SUB) exponent)*
exponent          : term ((POWER) term)*
# this one is right-associative (powers **)

term              : comparison ((MUL | DIV) comparison)*
comparison        : factor ((EQUAl   | L_EQUAL | LESS
                             N_EQUAL | G_EQUAL | GREATER) factor)*
# these are all binary operations



factor            : NUM | STR        | variable
                        | ADD factor | SUB factor
                        | LPAREN expr RPAREN
# different types of 'base' types like integers
# also contains parenthesised expressions which are evalutaed first

在解析令牌方面，我使用的方法与Ruslan 的博客中使用的方法相同。这是一个将解析 exponent 行的代码，它处理加法和减法，尽管它的名称如此，因为语法表明表达式被解析为 exponent_expr (+ / -) exponent_expr

def exponent(self):
    node = self.term()
    while self.current_token.type in (ADD, SUB):
        token = self.current_token

        if token.type == ADD:
            self.consume_token(ADD)
        elif token.type == SUB:
            self.consume_token(SUB)

        node = BinaryOperation(left_node=node,
                               operator=token,
                               right_node=self.term())

    return node

现在这可以很好地解析左结合标记（因为标记流自然地从左到右），但我一直在研究如何解析右结合指数。看这个预期in/out，供参考：

>>> 2 ** 3 ** 2
# should be parsed as...
>>> 2 ** (3 ** 2)
# which is...
>>> 2 ** 9
# which returns...
512

# Mine, at the moment, parses it as...
>>> (2 ** 3) ** 2
# which is...
>>> 8 ** 2
# which returns...
64

为了解决这个问题，我尝试切换 BinaryOperation() 构造函数的左右节点，使当前节点在右，新节点在左，但这只会使 2**5 解析为 5**2 这给了我 25 而不是预期的 32.

有什么我可以尝试的方法吗？

Answer 1

您的 exponent 函数实际解析 expression 的事实应该是一个危险信号。事实上，您需要的是一个解析表达式的 expression 函数和一个解析求幂的 exponent 函数。

您还混淆了求幂和乘法（以及其他运算）的优先级，因为 2 * x ** 4 并不意味着 (2 * x) ** 4（即 16x⁴），而是2 * (x ** 4)。出于同样的原因，x * 3 < 17 并不意味着 x * (3 < 17)，这是您的语法将如何解析它。

通常算术的优先级是这样的：

 comparison     <, <=, ==, ... ( lowest precedence)
 additive       +, -
 multiplicative *, /, %
 unary          +, -
 exponentiation **
 atoms          numbers, variables, parenthesized expressions, etc.

（如果你有像函数调用这样的后缀运算符，它们会介于求幂和原子之间。）

以这种形式修改语法后，指数解析器将如下所示：

def exponent(self):
    node = self.term()
    while self.current_token.type is POWER:
        self.consume_token(ADD)
        node = BinaryOperation(left_node=node,
                               operator=token,
                               right_node=self.exponent())
    return node

最后的递归调用产生了正确的结合性。在这种情况下，递归是可以接受的，因为左操作数和运算符已经被消耗掉了。因此递归调用不会产生无限循环。

解析右结合运算符（指数）

Parsing right-associative operator (exponents)

python

parsing

token

right-to-left