生成的野牛解析器的意外行为
Unexpected behavior of generated bison parser
我通过 Flex/Bison 创建了解析器,它在解析过程中意外失败。这是显示问题的简化示例
Lexer.l:
%{
#include "Parser.h"
%}
%option noyywrap nodefault
%%
"foo" { return FOO; }
"bar" { return BAR; }
"(" { return OP; }
")" { return CP; }
[ \t\n]+ { /* DO NOTHING */ }
. { YY_FATAL_ERROR("unknown character"); }
%%
和Parser.y(启用跟踪和冗长):
%{
#include <stdio.h>
int yylex();
void yyerror (char const *s);
%}
%debug
%verbose
%error-verbose
%token FOO BAR OP CP
%%
program_expr : foo_expr bar_expr {}
;
foo_expr : /* NOTHING */ {}
| OP FOO CP {}
;
bar_expr : /* NOTHING */ {}
| OP BAR CP {}
;
%%
int main(int argc, char** argv)
{
yydebug = 1;
yyparse();
return 0;
}
void yyerror (char const *s) { fprintf(stderr, "%s\n", s); }
但是如果我指定像 (bar)
这样的输入,生成的解析器将失败 - 在这种情况下,解析树应该包含空的 foo
表达式。它报告:
Starting parse
Entering state 0
Reading a token: Next token is token OP ()
Shifting token OP ()
Entering state 1
Reading a token: Next token is token BAR ()
syntax error, unexpected BAR, expecting FOO
Error: popping token OP ()
Stack now 0
Cleanup: discarding lookahead token BAR ()
Stack now 0
这是 shift/reduce automata
生成的描述中的一段文字:
state 0
0 $accept: . program_expr $end
OP shift, and go to state 1
OP [reduce using rule 2 (foo_expr)]
$default reduce using rule 2 (foo_expr)
program_expr go to state 2
foo_expr go to state 3
state 1
3 foo_expr: OP . FOO CP
FOO shift, and go to state 4
state 2
0 $accept: program_expr . $end
$end shift, and go to state 5
state 3
1 program_expr: foo_expr . bar_expr
OP shift, and go to state 6
$default reduce using rule 4 (bar_expr)
bar_expr go to state 7
但我无法理解 meaning/syntax 这样的状态。我的 grammar/parser 有什么问题?
Bison 默认生成 LALR(1) 解析器。 LALR(1) 代表 look ahead 1 token left to right 解析器。
你的语法不是 LALR(1)。在 OP 上,不清楚是期待 foo 还是 bar。那是 reduce/reduce 冲突。
看这里:
https://en.wikipedia.org/wiki/LALR_parser
但通常Bison 可以生成LR 解析器。至少这里有一个 wiki 条目声称:
https://en.wikipedia.org/wiki/GNU_Bison
您的案例是 "mysterious conflict":https://www.gnu.org/software/bison/manual/html_node/Mysterious-Conflicts.html#Mysterious-Conflicts
如果您只想接受 (bar)
作为输入,您可以使用以下内容:
program_expr : foo_expr bar_expr {}
| bar_expr {}
;
而不是这个:
program_expr : foo_expr bar_expr {}
;
测试输出:
> echo "(bar)" | ./Parser
Starting parse
Entering state 0
Reading a token: Next token is token OP ()
Shifting token OP ()
Entering state 1
Reading a token: Next token is token BAR ()
Shifting token BAR ()
Entering state 6
Reading a token: Next token is token CP ()
Shifting token CP ()
Entering state 11
Reducing stack by rule 6 (line 20):
= token OP ()
= token BAR ()
= token CP ()
-> $$ = nterm bar_expr ()
Stack now 0
Entering state 4
Reducing stack by rule 2 (line 14):
= nterm bar_expr ()
-> $$ = nterm program_expr ()
Stack now 0
Entering state 2
Reading a token: Now at end of input.
....
我通过 Flex/Bison 创建了解析器,它在解析过程中意外失败。这是显示问题的简化示例
Lexer.l:
%{
#include "Parser.h"
%}
%option noyywrap nodefault
%%
"foo" { return FOO; }
"bar" { return BAR; }
"(" { return OP; }
")" { return CP; }
[ \t\n]+ { /* DO NOTHING */ }
. { YY_FATAL_ERROR("unknown character"); }
%%
和Parser.y(启用跟踪和冗长):
%{
#include <stdio.h>
int yylex();
void yyerror (char const *s);
%}
%debug
%verbose
%error-verbose
%token FOO BAR OP CP
%%
program_expr : foo_expr bar_expr {}
;
foo_expr : /* NOTHING */ {}
| OP FOO CP {}
;
bar_expr : /* NOTHING */ {}
| OP BAR CP {}
;
%%
int main(int argc, char** argv)
{
yydebug = 1;
yyparse();
return 0;
}
void yyerror (char const *s) { fprintf(stderr, "%s\n", s); }
但是如果我指定像 (bar)
这样的输入,生成的解析器将失败 - 在这种情况下,解析树应该包含空的 foo
表达式。它报告:
Starting parse
Entering state 0
Reading a token: Next token is token OP ()
Shifting token OP ()
Entering state 1
Reading a token: Next token is token BAR ()
syntax error, unexpected BAR, expecting FOO
Error: popping token OP ()
Stack now 0
Cleanup: discarding lookahead token BAR ()
Stack now 0
这是 shift/reduce automata
生成的描述中的一段文字:
state 0
0 $accept: . program_expr $end
OP shift, and go to state 1
OP [reduce using rule 2 (foo_expr)]
$default reduce using rule 2 (foo_expr)
program_expr go to state 2
foo_expr go to state 3
state 1
3 foo_expr: OP . FOO CP
FOO shift, and go to state 4
state 2
0 $accept: program_expr . $end
$end shift, and go to state 5
state 3
1 program_expr: foo_expr . bar_expr
OP shift, and go to state 6
$default reduce using rule 4 (bar_expr)
bar_expr go to state 7
但我无法理解 meaning/syntax 这样的状态。我的 grammar/parser 有什么问题?
Bison 默认生成 LALR(1) 解析器。 LALR(1) 代表 look ahead 1 token left to right 解析器。
你的语法不是 LALR(1)。在 OP 上,不清楚是期待 foo 还是 bar。那是 reduce/reduce 冲突。
看这里: https://en.wikipedia.org/wiki/LALR_parser
但通常Bison 可以生成LR 解析器。至少这里有一个 wiki 条目声称: https://en.wikipedia.org/wiki/GNU_Bison
您的案例是 "mysterious conflict":https://www.gnu.org/software/bison/manual/html_node/Mysterious-Conflicts.html#Mysterious-Conflicts
如果您只想接受 (bar)
作为输入,您可以使用以下内容:
program_expr : foo_expr bar_expr {}
| bar_expr {}
;
而不是这个:
program_expr : foo_expr bar_expr {}
;
测试输出:
> echo "(bar)" | ./Parser
Starting parse
Entering state 0
Reading a token: Next token is token OP ()
Shifting token OP ()
Entering state 1
Reading a token: Next token is token BAR ()
Shifting token BAR ()
Entering state 6
Reading a token: Next token is token CP ()
Shifting token CP ()
Entering state 11
Reducing stack by rule 6 (line 20):
= token OP ()
= token BAR ()
= token CP ()
-> $$ = nterm bar_expr ()
Stack now 0
Entering state 4
Reducing stack by rule 2 (line 14):
= nterm bar_expr ()
-> $$ = nterm program_expr ()
Stack now 0
Entering state 2
Reading a token: Now at end of input.
....