LR(1) Shift/Reduce 消歧义
LR(1) Shift/Reduce Disambiguation
给定重复 BLOCK
s 的输入,其中每个块都有重复的 BEGIN EVENT
和 END EVENT
条目(END EVENT
总是跟在 BEGIN EVENT
之后):
[TIMESTAMP] BLOCK
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
...
[TIMESTAMP] BLOCK
你如何消除这个语法与 LR(1) 的歧义?我正在使用 LALRPOP,最小的例子是:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Block = BlockHeader (Begin End)+;
pub Blocks = Block*
因为 LR(1) 只能向前看一个记号,所以这个语法是有歧义的,正如 LALRPOP 有用地告诉你的(部分错误):
Local ambiguity detected
The problem arises after having observed the following symbols in the input:
BlockHeader (Begin End)+
At that point, if the next token is a `"[TIMESTAMP]"`, then the parser can proceed in two different ways.
First, the parser could execute the production at
/home/<snip>.lalrpop:51:9: 51:32, which would consume
the top 2 token(s) from the stack and produce a `Block`. This might then yield a parse tree like
BlockHeader (Begin End)+ Block
├─Block────────────────┤ │
├─Block+───────────────┘ │
└─Block+─────────────────────┘
Alternatively, the parser could shift the `"[TIMESTAMP]"` token and later use it to construct a
`Timestamp`. This might then yield a parse tree like
(Begin End)+ "[TIMESTAMP]" "BEGIN" "EVENT" End
│ ├─Timestamp─┘ │ │
│ └─Begin─────────────────────┘ │
└─(Begin End)+───────────────────────────────┘
我看到它告诉我,在解析 BlockHeader、Begin 和 End 之后,它无法确定下一个标记是另一个 Begin 还是另一个 Block 的开始。我还没有找到在 LR(1) 中消除歧义的方法,但我只能假设这是我缺乏理解,而不是 LR(1) 语法的继承限制?
不幸的是,如果不完全重组语法,这种 'need more lookahead' 问题很难解决,这通常会丢失输入的理想结构,有时会接受原始语法会拒绝的退化输入。您通常可以拒绝这些输入并通过 post 处理解析树来取回该结构,但这需要更多工作。在你的情况下,语法:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Event = Begin End;
Item = BlockHeader | Event;
pub Input = Item*
应该可以解决这个问题,但问题是它丢失了块结构(而不是给你一个非结构化的块头和事件序列),并且它接受空块。您可以通过 post-处理项目列表轻松处理这两个问题。
当所需的前瞻性较小且有界时,另一种选择是在标记器中处理它。我对 LALRPOP 不熟悉,但应该可以 "combine" [TIMESTAMP]
带有紧随其后的关键字标记的标记(因此时间戳不会出现在语法中,而只是作为属性关键字),在这种情况下,单标记前瞻一切都会正常工作。
给定重复 BLOCK
s 的输入,其中每个块都有重复的 BEGIN EVENT
和 END EVENT
条目(END EVENT
总是跟在 BEGIN EVENT
之后):
[TIMESTAMP] BLOCK
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
[TIMESTAMP] BEGIN EVENT
[TIMESTAMP] END EVENT
...
[TIMESTAMP] BLOCK
你如何消除这个语法与 LR(1) 的歧义?我正在使用 LALRPOP,最小的例子是:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Block = BlockHeader (Begin End)+;
pub Blocks = Block*
因为 LR(1) 只能向前看一个记号,所以这个语法是有歧义的,正如 LALRPOP 有用地告诉你的(部分错误):
Local ambiguity detected
The problem arises after having observed the following symbols in the input:
BlockHeader (Begin End)+
At that point, if the next token is a `"[TIMESTAMP]"`, then the parser can proceed in two different ways.
First, the parser could execute the production at
/home/<snip>.lalrpop:51:9: 51:32, which would consume
the top 2 token(s) from the stack and produce a `Block`. This might then yield a parse tree like
BlockHeader (Begin End)+ Block
├─Block────────────────┤ │
├─Block+───────────────┘ │
└─Block+─────────────────────┘
Alternatively, the parser could shift the `"[TIMESTAMP]"` token and later use it to construct a
`Timestamp`. This might then yield a parse tree like
(Begin End)+ "[TIMESTAMP]" "BEGIN" "EVENT" End
│ ├─Timestamp─┘ │ │
│ └─Begin─────────────────────┘ │
└─(Begin End)+───────────────────────────────┘
我看到它告诉我,在解析 BlockHeader、Begin 和 End 之后,它无法确定下一个标记是另一个 Begin 还是另一个 Block 的开始。我还没有找到在 LR(1) 中消除歧义的方法,但我只能假设这是我缺乏理解,而不是 LR(1) 语法的继承限制?
不幸的是,如果不完全重组语法,这种 'need more lookahead' 问题很难解决,这通常会丢失输入的理想结构,有时会接受原始语法会拒绝的退化输入。您通常可以拒绝这些输入并通过 post 处理解析树来取回该结构,但这需要更多工作。在你的情况下,语法:
Timestamp = "[TIMESTAMP]";
BlockHeader = Timestamp "BLOCK";
Begin = Timestamp "BEGIN" "EVENT";
End = Timestamp "END" "EVENT";
Event = Begin End;
Item = BlockHeader | Event;
pub Input = Item*
应该可以解决这个问题,但问题是它丢失了块结构(而不是给你一个非结构化的块头和事件序列),并且它接受空块。您可以通过 post-处理项目列表轻松处理这两个问题。
当所需的前瞻性较小且有界时,另一种选择是在标记器中处理它。我对 LALRPOP 不熟悉,但应该可以 "combine" [TIMESTAMP]
带有紧随其后的关键字标记的标记(因此时间戳不会出现在语法中,而只是作为属性关键字),在这种情况下,单标记前瞻一切都会正常工作。