如何在 Bison 出错后跳过一行中的其余标记

Question

我正在编写一个作业应用程序，它使用 Flex 和 Bison 来确定语句是否有效。在语句中检测到错误后，我想打印一条错误消息并移至下一行以查看下一条语句，但我尝试的所有操作均无效。

网上查了一下，Bison有一个内置的error token，可以用来做错误处理。通过使用错误'\n' {yyerrok;}，我应该能够实现我想要的，但它不起作用。

我的 Flex 代码：

%{
  #include <cstdio>
  #include <iostream>
  using namespace std;

  #include "exp.tab.h"  // to get the token types from Bison

%}
%%

--.*                    ;
[a-zA-Z][a-zA-Z0-9]*    {yylval.print = strdup(yytext); return ID;}
;\s*                    {return EOL;}
[-+*/%]                 {yylval.print = strdup(yytext); return OP;}
=                       {return EQU;}
\(                      {return OPEN;}
\)                      {return CLOSE;}
[0-9]                   ;
\n                      ;
\r                      ;
.                       ;
%%

我的 Bison 代币和规则：

%union{

    char *print;

}

%token EQU
%token <print> ID
%token EOL
%token <print> OP
%token OPEN
%token CLOSE

%%

lines: line
    |   lines line
;

line: ass {cout << " VALID" << endl;}
    |   exp {cout << " VALID" << endl;}
    |   error '\n' {yyerrok;}
;

ass: id EQU {cout << " ="; } exp EOL {cout << ";";}
;

exp: term
    |   exp op term 
;

term: id 
    |   OPEN {cout << "(";} exp op term CLOSE {cout << ")";}
;

id: ID {cout << ; }

op: OP {cout << ; }


%%

我的 yyerror() 只打印 "Error ".

我的解析输入：

-- Good (valid) statements:

first = one1 + two2 - three3 / four4 ;
second = one1 * (two2 * three3) ;
one1 * i8766e98e + bignum
second = (one1 * two2) * three3 ;
third = ONE + twenty - three3 ;
third = old * thirty2 / b567 ;

-- Bad (invalid) statements:

first = = one1 + two2 - three3 / four4 ;
first = one1 + - two2 - three3 / four4 ;
first = one1 + two2 - three3 / four4
first = one1 + two2 ? three3 / four4 ;
second = 4 + ( one1 * two2 ) * ( three3 + four4 ;
third = one1 + 24 - three3 ;
one1 +- delta
sixty6 / min = fourth ;

我希望输出打印错误然后移至下一行

first =one1+two2-three3/four4; VALID
second =one1*(two2*three3); VALID
one1*i8766e98e+bignum VALID
second =(one1*two2)*three3; VALID
third =ONE+twenty-three3; VALID
third =old*thirty2/b567; VALID
first = Error
first = one1 + Error
first = one1 + two2 - three3 / four4 Error
first = one1 + two2 Error
.
.
.

但是当我运行它时，它只是停在第一个错误打印

first =one1+two2-three3/four4; VALID
second =one1*(two2*three3); VALID
one1*i8766e98e+bignum VALID
second =(one1*two2)*three3; VALID
third =ONE+twenty-three3; VALID
third =old*thirty2/b567; VALID
first = Error

任何帮助将不胜感激，但主要是我想知道为什么错误“\n”规则不起作用以及我可以做些什么来修复它。

Answer 1

使用 '\n' 不起作用，因为您的词法分析器从不 returns '\n'，因此令牌流中永远不会有任何 '\n' 令牌。基本上，如果词法分析器忽略某些字符，您就不能以任何方式在解析器中使用它们，包括用于错误恢复。

所以你的两个选择是停止忽略换行符（这可能是个坏主意，因为那样你就必须在语法中任何你想允许换行符的地方提到它们）或使用其他一些错误标记恢复。跳过所有内容直到下一个分号可能是一个不错的选择（尽管这仍然不会产生您预期的输出，因为并非所有行都以分号结尾）。

Answer 2

因为你的词法分析器忽略了 \n，告诉解析器跳过标记直到它看到换行符将导致它跳过文件的其余部分。

但是，您（几乎）可以通过让词法分析器识别换行符来完成这项工作，但只能在错误恢复期间进行。（检查 \n 的操作并忽略它或发送它。）

但这偶尔会产生奇怪的结果，因为产生错误的标记可能在下一行，在这种情况下，换行符在检测到错误之前就已经被忽略了。例如，这里的问题是缺少分号：

a = 1
while (a > 0) {
    …

但是只有在读取 while 之后才会检测到该错误。（如果下一个标记是 +，解析应该继续。）所以跳到行尾意味着在第三行继续解析，因此引入了一个不平衡的括号。

即便如此，这可能是一个有趣的开始。

如何在 Bison 出错后跳过一行中的其余标记

How to skip the rest of the tokens in a line after an error in Bison

error-handling

parsing

bison

flex-lexer