调用 'more' 命令后 Antlr4 词法分析器的意外行为

Question

我的词法分析器语法中的一条规则中有 'more' 命令。结果，单个字符标记匹配多个字符文本，这一定不会发生，否则我会遗漏一些东西。这是语法：

lexer grammar MyLexer;
    StartQuote
        : '"'
        -> pushMode(BeforeTextMode)
        ;
mode BeforeTextMode;
    SwitchToTextMode
        : .
        -> more, mode(TextMode)
        ;
mode TextMode;
    Text
        : ~'"'+
        ;
    EndQuote
        : '"'
        -> popMode
        ;

这里是测试程序：

class Program
{
    static string InputText1 = "\"x\"";
    static string InputText2 = "\"xy\"";

    static string[] TokenTypeNames = new string[] { "StartQuote", "Text", "EndQuote" };

    static void Main(string[] args)
    {
        string TokenSequence1 = GetTokenSequence(InputText1);
        string TokenSequence2 = GetTokenSequence(InputText2);

        Console.WriteLine(TokenSequence1);
        Console.WriteLine(TokenSequence2);
    }

    static string GetTokenSequence(string InputText)
    {
        var Lexer = new MyLexer(new AntlrInputStream(InputText));
        string TokenSequence = "";
        for (var Token = Lexer.NextToken(); Token.Type != -1; Token = Lexer.NextToken())
            TokenSequence += TokenTypeNames[Token.Type - 1] + "(" + Token.Text + ")" + " ";
        return TokenSequence;
    }
}

输出：

StartQuote(") EndQuote(x")
StartQuote(") Text(xy) EndQuote(")

从程序的输出可以看出，单字符 EndQuote 匹配多字符文本。只有当输入文本在引号之间包含单个字符时才会发生这种情况。

你能看看我这里是否遗漏了什么，这是否确实是 Antlr4 中的错误。

Answer 1

more命令使得匹配的内容被添加到下一个实际产生的token的内容中。对于输入 "x"，点匹配并消耗输入中的 x； Text 规则没有额外的输入来有效匹配，所以没有 Text 标记。

点匹配后的第一个标记是结束引号标记，因此以内容 x".

结尾

顺便说一句，此行为允许遵循 more 命令的连续规则匹配将内容累积到最终生成的令牌中。

调用 'more' 命令后 Antlr4 词法分析器的意外行为

Unexpected behaviour of Antlr4 lexer after invoking 'more' command

antlr

lexer

antlr4