具有多种模式的 Antlr 4 Lexer 无法正确标记

Question

我正在尝试使用 Antlr 4.7 创建具有多种模式的词法分析器。我的词法分析器目前是：

ACTIONONLY  : 'AO'; 

BELIEFS :   ':Initial Beliefs:' -> mode(INITIAL_BELIEFS);
NAME    :   ':name:';
WORD:   ('a'..'z'|'A'..'Z'|'0'..'9'|'_')+;

COMMENT : '/*' .*? '*/' -> skip ;
LINE_COMMENT : '//' ~[\n]* -> skip ;
NEWLINE:'\r'? '\n' -> skip  ;
WS  :   (' '|'\t') -> skip ;

mode INITIAL_BELIEFS;
GOAL_IB :   ':Initial Goal:' -> mode(GOALS);
IB_COMMENT : '/*' .*? '*/' -> skip ;
IB_LINE_COMMENT : '//' ~[\n]* -> skip ;
IB_NEWLINE:'\r'? '\n' -> skip  ;
IB_WS  :   (' '|'\t') -> skip ;
BELIEF_BLOCK: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+;

mode REASONING_RULES;
R1: 'a';
R2: 'b';

mode GOALS;
GL_COMMENT : '/*' .*? '*/' -> skip ;
GL_LINE_COMMENT : '//' ~[\n]* -> skip ;
GL_NEWLINE:'\r'? '\n' -> skip  ;
GL_WS  :   (' '|'\t') -> skip ;
GOAL_BLOCK: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+;

请注意，目前没有办法进入 REASONING_RULES 模式（所以这不应该，据我所知，它对词法分析器的操作有任何影响）。显然我确实想使用这种模式，但这是似乎显示我遇到的问题的词法分析器的最小版本。

我的解析器是：

grammar ActionOnly;

options { tokenVocab = ActionOnlyLexer; }

// Mas involving ActionOnly Agents
mas  :  aoagents;

aoagents: ACTIONONLY (aoagent)+;

// Agent stuff
aoagent  : 
    (ACTIONONLY?) 
    NAME w=WORD  
    BELIEFS (bs=BELIEF_BLOCK )?
    GOAL_IB gs=GOAL_BLOCK;

我正在尝试解析：

AO

:name: robot

:Initial Beliefs:

abelief

:Initial Goal:

at(4, 2)

失败并出现错误

行35:0输入不匹配'at(4,'需要GOAL_BLOCK

我假设这是因为它没有正确标记。

如果我在 REASONING_RULES 模式中省略规则 R2 那么它会正确解析（通常我似乎能够在 REASONING_RULES 中有一个规则并且它会起作用，但不止一个规则它无法匹配 GOAL_BLOCK)

我真的很难看到我在这里做错了什么，但这是我第一次尝试在 Antlr 中使用词法分析器模式。

Answer 1

当我尝试你的语法时，我没有收到那个错误。我还测试了 ANTLR 4.7。

这是我的测试装置：

import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.ParserRuleContext;
import org.antlr.v4.runtime.Token;

public class Main {

    public static void main(String[] args) {

        String source = "AO\n" +
                "\n" +
                ":name: robot\n" +
                "\n" +
                ":Initial Beliefs:\n" +
                "\n" +
                "abelief\n" +
                "\n" +
                ":Initial Goal:\n" +
                "\n" +
                "at(4, 2)";

        ActionOnlyLexer lexer = new ActionOnlyLexer(CharStreams.fromString(source));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        tokens.fill();

        System.out.println("[TOKENS]");

        for (Token t : tokens.getTokens()) {
            System.out.printf("  %-20s %s\n", ActionOnlyLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
        }

        System.out.println("\n[PARSE-TREE]");

        ActionOnlyParser parser = new ActionOnlyParser(tokens);
        ParserRuleContext context = parser.mas();

        System.out.println("  "+context.toStringTree(parser));
    }
}

这被打印到我的控制台：

[TOKENS]
  ACTIONONLY           AO
  NAME                 :name:
  WORD                 robot
  BELIEFS              :Initial Beliefs:
  BELIEF_BLOCK         abelief
  GOAL_IB              :Initial Goal:
  GOAL_BLOCK           at(4,
  GOAL_BLOCK           2)
  EOF                  <EOF>

[PARSE-TREE]
  (mas (aoagents AO (aoagent :name: robot :Initial Beliefs: abelief :Initial Goal: at(4,)))

也许您需要生成新的 lexer/parser 类?

PS。注意 ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'('|')'|','|'.')+ 可以写成 [a-zA-Z0-9_(),.]+

具有多种模式的 Antlr 4 Lexer 无法正确标记

Antlr 4 Lexer with multiple modes failing to tokenise correctly

antlr

lexer