为什么我的 EOF / linebreak 非终结符总是有 shift/reduce 冲突?
Why do I always have a shift/reduce conflict with my EOF / linebreak nonterminals?
所以,在将解析语法放在一起方面,我还很初级。当我遇到 shift-reduce 冲突时,我需要帮助来剖析 Menhir 报告的冲突。
以这个小语法为例:
(* {2 Tokens } *)
%token EOF
%token COLON PIPE SEMICOLON
%token <string> COUNT
%token <string> IDENTIFIER
%start <AST.t> script
%start <AST.statement> statement
%%
(* {2 Rules } *)
script:
| it = separated_list(break, statement); break?; EOF { { statements = it } }
;
statement:
| COLON*; count = COUNT?; cmd = command { AST.make_statement ~count ~cmd }
;
command:
| it = IDENTIFIER { it }
;
break:
| SEMICOLON { }
;
%%
Menhir 的 --explain
标志生成了对由此产生的 shift/reduce 冲突的描述。不幸的是,我无法弄清它的正反面:
** Conflict (shift/reduce) in state 3.
** Token involved: SEMICOLON
** This state is reached from script after reading:
statement
** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)
script
(?)
** In state 3, looking ahead at SEMICOLON, shifting is permitted
** because of the following sub-derivation:
loption(separated_nonempty_list(break,statement)) option(break) EOF
separated_nonempty_list(break,statement)
statement break separated_nonempty_list(break,statement)
. SEMICOLON
** In state 3, looking ahead at SEMICOLON, reducing production
** separated_nonempty_list(break,statement) -> statement
** is permitted because of the following sub-derivation:
loption(separated_nonempty_list(break,statement)) option(break) EOF // lookahead token appears because option(break) can begin with SEMICOLON
separated_nonempty_list(break,statement) // lookahead token is inherited
statement .
我花了一个晚上试图通过文档挖掘 shift/reduce 冲突实际上 是 ,但我不得不承认我真的很难花时间理解我正在阅读的内容。有人可以给我一个关于 shift/reduce 冲突的简单(嗯,尽可能多的)解释吗?具体使用上面例子的上下文?
问题是,在查看分号时,解析器无法决定它是否应该期待 EOF 或列表的其余部分。原因是您使用 break
作为可选的终止符,而不是分隔符。
我建议您更改主要规则:
script:
| it = optterm_list(break, statement); EOF { { statements = it } }
;
然后自己定义 optterm_list
组合子,像这样:
optterm_list(separator, X):
| separator? {[]}
| l=optterm_nonempty_list(separator, X) { l }
optterm_nonempty_list(separator, X):
| x = X separator? { [ x ] }
| x = X
separator
xs = optterm_nonempty_list(separator, X)
{ x :: xs }
所以,在将解析语法放在一起方面,我还很初级。当我遇到 shift-reduce 冲突时,我需要帮助来剖析 Menhir 报告的冲突。
以这个小语法为例:
(* {2 Tokens } *)
%token EOF
%token COLON PIPE SEMICOLON
%token <string> COUNT
%token <string> IDENTIFIER
%start <AST.t> script
%start <AST.statement> statement
%%
(* {2 Rules } *)
script:
| it = separated_list(break, statement); break?; EOF { { statements = it } }
;
statement:
| COLON*; count = COUNT?; cmd = command { AST.make_statement ~count ~cmd }
;
command:
| it = IDENTIFIER { it }
;
break:
| SEMICOLON { }
;
%%
Menhir 的 --explain
标志生成了对由此产生的 shift/reduce 冲突的描述。不幸的是,我无法弄清它的正反面:
** Conflict (shift/reduce) in state 3.
** Token involved: SEMICOLON
** This state is reached from script after reading:
statement
** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)
script
(?)
** In state 3, looking ahead at SEMICOLON, shifting is permitted
** because of the following sub-derivation:
loption(separated_nonempty_list(break,statement)) option(break) EOF
separated_nonempty_list(break,statement)
statement break separated_nonempty_list(break,statement)
. SEMICOLON
** In state 3, looking ahead at SEMICOLON, reducing production
** separated_nonempty_list(break,statement) -> statement
** is permitted because of the following sub-derivation:
loption(separated_nonempty_list(break,statement)) option(break) EOF // lookahead token appears because option(break) can begin with SEMICOLON
separated_nonempty_list(break,statement) // lookahead token is inherited
statement .
我花了一个晚上试图通过文档挖掘 shift/reduce 冲突实际上 是 ,但我不得不承认我真的很难花时间理解我正在阅读的内容。有人可以给我一个关于 shift/reduce 冲突的简单(嗯,尽可能多的)解释吗?具体使用上面例子的上下文?
问题是,在查看分号时,解析器无法决定它是否应该期待 EOF 或列表的其余部分。原因是您使用 break
作为可选的终止符,而不是分隔符。
我建议您更改主要规则:
script:
| it = optterm_list(break, statement); EOF { { statements = it } }
;
然后自己定义 optterm_list
组合子,像这样:
optterm_list(separator, X):
| separator? {[]}
| l=optterm_nonempty_list(separator, X) { l }
optterm_nonempty_list(separator, X):
| x = X separator? { [ x ] }
| x = X
separator
xs = optterm_nonempty_list(separator, X)
{ x :: xs }