递归下降解析器：RPN [2] 的中缀

Question

这是我尝试制作递归下降解析器——LL(1)——的延续，它接受中缀表达式并输出 RPN。这是我的的 link，@rici 做了出色的回答，我希望我能用这个修改后的实现来公正地回答他的问题。我的新语法如下（不支持一元运算符）：

expr -> term (+|-) term | term
term -> exponent (*|/) exponent | exponent
exponent -> factor ^ factor | factor
factor -> number | ( expr )

@rici 在他的回答中指出 Norwell's grammar:

We normally put the unary negation operator between multiplication and exponentiation

而且我已经尝试在这里进行合作：

expr -> term (+|-) term | term
term -> exponent1 (*|/) exponent1 | exponent1
exponent1 -> (+|-) exponent | exponent 
exponent -> factor ^ factor | factor
factor -> number | ( expr )

第一个语法的编码使其无法接受一元 (+/-) 数字，只有二元 -/+ 运算符才能被接受。该解决方案适用于我尝试过的问题数量（可能是错误的，希望了解更多）。然而，仔细检查后，第二个失败了，我被迫回到我第一个使用的 "hack" 。正如@rici 指出的那样：

By the way, your output is not Reverse Polish Notation (and nor is it unambiguous without parentheses) because you output unary operators before their operands.

公平地说，他确实指出添加了额外的 0 操作数，这很好，我认为它会起作用。但是如果我做 13/-5 这个，它的等价中缀是 13/0-5 和它的 RPN 13 0 / 5 -。或许我误解了他的意思。

最后把钉子钉在棺材上@rici 也指出：

left-recursion elimination would have deleted the distinction between left-associative and right-associative operators

因此这意味着几乎不可能确定任何运算符的结合性，因此所有运算符都相同而 none 不同。此外，这意味着对于简单的 LL(1) 解析器来说，如果不是不可能的话，尝试支持许多左右结合运算符将非常困难。

这是语法的 C 代码实现：


#include <stdio.h>
#include <stdlib.h>

void error();
void factor();
void expr();
void term();
void exponent1();
void exponent();
void parseNumber();
void match(int t);

char lookahead;
int position=0;
int main() {
  lookahead = getchar();
  expr();
return 0;
}

void error() {
  printf("\nSyntax error at lookahead %c pos: %d\n",lookahead,position);
  exit(1);
}


void factor() {
 if (isdigit(lookahead)) {
      parseNumber();
     // printf("lookahead at %c",lookahead);
  } else if(lookahead =='('){
      match('(');
      expr();
      match(')');
  }else {
      error();
      }
}

void expr(){
    term();
    while(1){
        if(!lookahead||lookahead =='\n') break;
        if(lookahead=='+'|| lookahead=='-'){
            char token  = lookahead;
            match(lookahead);
            term();
            printf(" %c ", token);
        }else {
            break;
        }
    }
}

void term(){
    exponent1();
    while(1){
        if(!lookahead||lookahead =='\n') break;
        if(lookahead=='/'|| lookahead=='*'){
            char token = lookahead;
            match(lookahead);
            exponent1();
            printf(" %c ", token);
        }else {
            break;
        }
    }
}

void exponent1(){
    if(lookahead=='-'||lookahead=='+'){
        char token  = lookahead;
        match(lookahead);
        //having the printf here:
        printf("%c", token);
        //passes this:
        //  2+6*2--5/3      := 2.00 6.00 2.00  *  + 5.00 3.00  /  -
        // -1+((-2-1)+3)*-2 := -1.00 -2.00 1.00  - 3.00  + -2.00  *  + (not actual RPN @rici mentions)
        //but fails at:
        // -(3/2)           := -3.00 2.00  /
        // -3/2             := -3.00 2.00  /
        exponent();
        // but having the printf here
        //printf("%c ", token);
        // fails this -1+((-2-1)+3)*-2 := 1.00 - 2.00 - 1.00  - 3.00  + 2.00 -  *  +
        // since it is supposed to be
        //  1.00 - -2.00 1.00 - 3.00 + -2.00 * +
        //  but satisfies this:
        // -(3/2) := 3.00 2.00  / -
        // (-3/2) := 3.00 - 2.00  /
    }else {
        exponent();
        //error();
    }
}

void exponent(){
    factor();
        while(1){
        if(!lookahead||lookahead =='\n') break;
        if(lookahead=='^'){
            char token  = lookahead;
            match('^');
            factor();
            printf(" ^ ");
        }else {
            break;
        }
    }
}

void parseNumber() {
  double number = 0;
  if (lookahead == '[=12=]'|| lookahead=='\n') return;
  while (lookahead >= '0' && lookahead <= '9') {
    number = number * 10 + lookahead - '0';
    match(lookahead);
  }
  if (lookahead == '.') {
    match(lookahead);
    double weight = 1;
    while (lookahead >= '0' && lookahead <= '9') {
      weight /= 10;
      number = number + (lookahead - '0') * weight;
      match(lookahead);
    }
  }
  printf("%.2f ", number);
  //printf("\ncurrent look ahead at after exiting parseNumber %c\n",lookahead);
}
void match(int t) {
  if (lookahead == t){
    lookahead = getchar();
    position++;
    }
  else error();

}

那么这是否意味着我应该放弃 LL(1) 解析器并转而查看 LR 解析器？或者可以增加前瞻数量帮助，如果有很多路径，那么它可能会缩小范围，减少前瞻的前瞻。例如：

-(5;;看起来很奇怪

-( 5 ;;可能是 - ( exp )

或

--5;;可能有很多东西

-- 5 ;;应该是 -- 运算符和输出说 #

编辑：

我认为具有更大的前瞻性将很难协调。因此，也许有一些类似于调车场算法的东西，我喜欢在其中查看下一个运算符，并根据运算符的优先级，算法将确定要执行的函数调用。类似于使用实际运行程序的实际堆栈。所以 pop 是 return 而 push 是函数调用。不确定我如何通过递归下降来协调它。
也许 peek 的优先级应该决定前瞻长度？

Answer 1

增加前瞻性没有帮助。
这是算术表达式的常用 LALR(1) 文法，包括求幂：
```
expr -> sum
sum -> sum (+|-) product | product
product -> product (*|/) prefix | prefix
prefix -> (+|-) prefix | exponent 
exponent -> atom ^ exponent | atom
atom -> number | ( expr )
```
您可以在整个 Internet 上找到构建语法模型的示例，尽管您还会发现许多示例，其中始终使用相同的非终端，并且使用优先级声明处理由此产生的歧义。

请注意 exponent 与其他二元运算符之间的结构差异。 exponent 是右递归的（因为求幂是右结合的）；其他的是左递归的（因为其他二元运算符是左结合的）。
当我说您可以通过添加一个明确的 0 来解决前缀运算符字符的歧义时，我并不是说您应该编辑您的输入以插入 0。那不会工作，因为（正如您所注意到的）它使一元运算符的优先级错误。我的意思是递归下降 RPN 转换器应该看起来像这样：
```
void parsePrefix(void) {
  if (lookahead=='-'||lookahead=='+') {
    char token = lookahead;
    match(lookahead);
    fputs("0 ", stdout);
    parsePrefix();
    printf("%c ", token);
  }
  else {
    parseExponent();
  }
}
```
在需要去的地方准确地输出 0。

警告：以下段落纯属观点，不符合 Whosebug 的政策。如果这会冒犯您，请不要阅读。（在这种情况下，也许您应该跳过此答案的其余部分。）

恕我直言，这确实是一个 hack，但 RPN 的使用也是如此。如果您要构建 AST，您只需构建一个具有单个操作数的 unaryOperator AST 节点。不会有歧义问题，因为在评估期间不需要再次解释令牌。无论出于何种原因，学习过通常的编译器理论的学生类似乎从中脱颖而出，认为 AST 在某种程度上是一种复杂的技术，在必要时应避免使用，必须不惜一切代价避免左递归，并且编写自己的 LL 解析器而不是仅仅使用标准可用的 LALR 解析器生成器具有道德价值。我不同意所有这些事情。特别是，我建议您从创建 AST 开始，因为它会使几乎所有其他事情变得更容易。此外，如果您想了解解析，请从使用标准工具开始，专注于编写清晰的、自文档化的语法，并使用有关您尝试解析的输入的句法结构的信息。稍后您可以了解解析器生成器的工作原理，如果您真的觉得那很有趣的话。同样，我绝不会从 sin() 函数的泰勒展开式的准确计算开始来教授三角学。这并没有为学生提供任何关于如何使用三角函数（例如，旋转一个角度）的任何见解，这无疑是三角学中最重要的部分。一旦学生对三角函数、如何使用三角函数，特别是典型问题领域中精确计算的要求有了深刻的理解，那么他们可能会想看看泰勒展开式和其他计算技术。但大多数人会满足于调用 sin()，我认为这很完美。
如果你真的想使用递归下降解析器，那就去吧。他们当然可以工作。

当您将语法编码为可执行程序时会发生什么，您将慢慢开始偏离可用于其他程序（如语法着色器和静态分析器）的语法表示。部分原因是您使用的语法丢失了语法的重要方面，包括关联性，而这些功能直接编码到您的解析器代码中。最终的结果往往是只维护解析器代码，而任由理论语法腐烂。当代码本身是 "grammar" 时，它不再可用作您的语言语法的实用文档。

但我并不是说不能做到。这肯定是可以做到的，并且在实际使用中有很多解析器都是这样做的。
调车场算法（以及一般的运算符优先级解析）是一种自下而上的技术，就像 LR 解析一样，它们都不需要递归解析器。如果出于某种原因你真的想使用递归，你可以改用 Pratt 解析器，但是自下而上的解析比递归下降有一个巨大的实际优势：它消除了无法控制的堆栈溢出的可能性。因此很难推荐在生产解析器中使用递归下降，除非严格控制输入文本以避免可能通过堆栈溢出进行的攻击。这可能不适用于未与未经验证的输入一起使用的编译器。但是你的编译器是这样吗？您是否从未从国外站点下载过源码压缩包，然后输入 ./configure && make all？ :-)

递归下降解析器：RPN [2] 的中缀

Recursive descent Parser: infix to RPN [2]

algorithm

parsing

recursive-descent

infix-notation

bnf

编辑：