antlr4 规则不忽略独立的开括号

Question

情况：

rule   : block+ ;
block  : '[' String ']' ;
String : ([a-z] | '[' | '\]')+ ;

技巧是 String 可以包含没有反斜杠转义的 [ 和有反斜杠转义的 ]，所以在这个例子中：

[hello\]world][hello[[world]

第一个块可以正确解析，但第二个...解析器正在尝试为每个 [ 查找 ]。有没有办法让 antlr 解析器忽略这个独立的 [？我无法更改格式，但我需要使用 antlr 找到一些解决方法。

PS：如果没有 antlr，则有算法可以避免这种情况，例如：在队列中收集 [，然后我们将首先找到 ] 并且只使用队头。但是我真的需要antlr =_=

Answer 1

您可以使用 Lexer 模式。

Lexical modes allow us to split a single lexer grammar into multiple sublexers. The lexer can only return tokens matched by rules from the current mode.

您可以在 antlr 文档中阅读更多关于词法分析器规则的信息 here。

首先，您需要将语法分为 lexer 和 parser。看到左括号后就使用另一种模式。

解析器语法：

parser grammar TestParser;

options { tokenVocab=TestLexer; }

rul   : block+ ;
block  : LBR STRING RBR ;

词法分析器语法：

lexer grammar TestLexer;

LBR: '[' -> pushMode(InString);

mode InString;

STRING : ([a-z] | '\]' | '[')+ ;
RBR: ']' -> popMode;

工作示例是 here。

您可以阅读有关词法分析器模式的文档

antlr4 规则不忽略独立的开括号

antlr4 rule not ignoring standalone open bracket

java

antlr

antlr4