Shift/reduce 与中缀部分冲突

Question

我在使用类似 yacc 的语法实现（尤其是使用 ocamlyacc）时遇到了问题，其中包括普通中缀操作和中缀部分，例如 Haskell。我希望所有这些都是合乎语法的：

(+1)
(1+)
(+)
(1+1)

但是，我无法使它正常工作，即使是摆弄 associativity/precedence 声明也是如此。我可以在 grammar.output 中看到问题发生的地方（它正在移动我希望它减少的地方），但我无法让它按照我想要的方式进行。这是问题的简化演示。

lex.mll 有：

{
  open Parse
  exception Eof
}
rule token = parse
  | [' ' '\t'] { token lexbuf }
  | ['\n'] { EOL }
  | ['0'-'9']+ as num {INT(int_of_string num)}
  | '+' { PLUS }
  | '*' { TIMES }
  | '(' { LPAREN }
  | ')' { RPAREN }
  | eof { raise Eof }

main.ml 有：

let _ =
  try
    let lexbuf = Lexing.from_channel stdin in
    while true do
      let result = Parse.start Lex.token lexbuf in
      print_string result; print_newline(); flush stdout
    done
  with Lex.Eof -> exit 0

和parse.mly（问题所在）有：

%token <int> INT
%token PLUS TIMES
%token LPAREN RPAREN
%token EOL

%left PLUS
%left TIMES

%start start
%type <string> start
%%

start:
| expr EOL {}
;

expr:
| application {}
| expr PLUS expr {"[" ^  ^ "+" ^  ^"]"}
| expr TIMES expr {"[" ^  ^ "*" ^  ^"]"}
;

section:
| LPAREN atom PLUS RPAREN { "(" ^  ^ " +)" }
| LPAREN PLUS atom RPAREN { "(+ " ^  ^ ")" }
| LPAREN PLUS RPAREN { "(+)" }
;

application:
| atom {}
| application atom {"[" ^  ^ " " ^  ^ "]"}
;

atom:
| INT {string_of_int }
| section {  }
| LPAREN expr RPAREN { "(" ^  ^ ")" }
;

%%

运行 ocamlyacc 告诉我有 1 shift/reduce conflict。特别是这里是详细日志的相关部分：

Rules:
   6  section : LPAREN atom PLUS RPAREN
   ...
   9  application : atom
...
12: shift/reduce conflict (shift 21, reduce 9) on PLUS
state 12
        section : LPAREN atom . PLUS RPAREN  (6)
        application : atom .  (9)

        PLUS  shift 21
        INT  reduce 9
        MINUS  reduce 9
        TIMES  reduce 9
        LPAREN  reduce 9
        RPAREN  reduce 9
...
state 21
        section : LPAREN atom PLUS . RPAREN  (6)

        RPAREN  shift 26
        .  error

并且运行编译后的程序将正确解析以下所有内容：

(1+)
(+1)
(+)
1+2

但失败：

(1+2)

如果另一方面，我创建一个具有高优先级的虚拟标记 HIGH：

%left PLUS MINUS
%left TIMES
%nonassoc HIGH

然后把%prec HIGH放在规则9上：

application: atom %prec HIGH {}

在那种情况下 (1+2) 会解析，但 (1+) 不会。

我了解 shift/reduce 冲突的一般背景。我只是不知道如何协商它来解决这个解析挑战。

Answer 1

省略了很多语法，你有以下产品，所有这些都可以同时可行。

atom:    LPAREN expr RPAREN
expr:           expr PLUS expr
section: LPAREN atom PLUS RPAREN

假设我们刚刚阅读了 ( 0 -- 即 LPAREN 和 INT -- 下一个token是+。此时，我们需要将INT缩减为atom，但我们做不到判断后面的内容是否匹配 atom 或 section 规则。要匹配 atom 规则，我们需要将 atom 减少为 expr -- 通过 application -- 但是为了匹配 section 规则，我们需要它保持为 atom。所以我们有一个 shift/reduce 冲突；我们不不知道我们是否需要现在转移 +，或者在进行更多单位缩减之后。

简单的解决办法是推迟决定。如果 section 规则是：

section: LPAREN expr PLUS RPAREN

那就没问题了。我们将继续减少单位直到我们得到 expr，然后我们将移动 +，然后我们要么看到 ) 或者我们会看到一些可以启动 expr 的东西。冲突已解决。

当然，这会改变语言，使其更加宽松。我们可能不想接受：

( 3 + 4 + )

或

( (+) 3 4 + )

但是生成的语法没有歧义。我们可以让解析器继续，然后在减少 section 时发出错误消息，方法是检查 </code> 是否受到适当限制。（这是一个很常见的技术，并没有错。）</p> <p>或者，我们可以将 </p> <pre><code>expr: expr PLUS expr

划分为两个互斥的备选方案：

expr: atom PLUS expr
expr: expr_not_an_atom PLUS expr

这也可以解决冲突，因为 atom 不能减少到 expr_not_an_atom。但它留下了如何定义 expr_not_an_atom.

的问题

碰巧的是，我很确定这是可能的，但这并不是微不足道的，其后果会波及语法。我也不能给你一个算法，因为 CFGs——不像正则表达式——不会在否定或集差下关闭。但基本上，您只需要级联非终结符，将它们拆分，以便每个替代方案都适合 atom 或 expr_not_an_atom——这也是一种合法的方法，但生成的语法可能很难理解阅读。

如果您使用的是 bison，您还有另一种选择：生成 GLR 语法。只要您的语言没有歧义，GLR 语法就会找到正确的解析，可能会稍微慢一些，但您的工作量会少很多。

如果有帮助，here's a slightly related answer 我在其中提出了一个完整的拆分非终端的解决方案。

Shift/reduce 与中缀部分冲突

Shift/reduce conflict with infix sections

yacc

lalr

infix-notation

ocamlyacc

shift-reduce-conflict