ANTLR4:如何匹配行首的额外空格?

ANTLR4: How to match extra spaces at the beginning of a line?

我试图匹配行首多余的space,但是没有成功。如何修改词法规则来匹配?

TestParser.g4:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
    : choice+ EOF
    ;

choice:
    QUESTION OPTION+;

TestLexer.g4:

lexer grammar TestLexer;

@lexer::members {
    private boolean aheadIsNotAnOption(IntStream _input) {
        int nextChar = _input.LA(1);
        return nextChar != 'A' && nextChar != 'B' && nextChar != 'C' && nextChar != 'D';
    }
}

QUESTION:                      {getCharPositionInLine() == 0}? DIGIT DOT CONTENT -> pushMode(OPTION_MODE);
OTHER:                         . -> skip;

mode OPTION_MODE;
OPTION:                        OPTION_HEADER DOT CONTENT;
NOT_OPTION_LINE:               NEWLINE SPACE* {aheadIsNotAnOption(_input)}? -> popMode, skip;
OPTION_OTHER:                  OTHER -> skip;

fragment DIGIT:                [0-9]+;
fragment OPTION_HEADER:        [A-D];
fragment CONTENT:              [a-zA-Z0-9 ,.'?/()!]+? {_input.LA(1) == '\n'}?;
fragment DOT:                  '.';
fragment NEWLINE:              '\n';
fragment SPACE:                ' ';

正文:

1.title
A.aaa
B.bbb
 C.ccc
2.title
A.aaa

Java代码:

import org.antlr.v4.runtime.CharStream;
import org.antlr.v4.runtime.CharStreams;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.Lexer;
import org.antlr.v4.runtime.tree.ParseTree;

import java.io.IOException;
import java.net.URISyntaxException;

public class TestParseTest {

    public static void main(String[] args) throws URISyntaxException, IOException {
        CharStream charStream = CharStreams.fromString("1.title\n" +
                "A.aaa\n" +
                "B.bbb\n" +
                " C.ccc\n" +
                "2.title\n" +
                "A.aaa\n");
        Lexer lexer = new TestLexer(charStream);

        CommonTokenStream tokens = new CommonTokenStream(lexer);
        TestParser parser = new TestParser(tokens);
        ParseTree parseTree = parser.root();

        System.out.println(parseTree.toStringTree(parser));
    }

}

输出结果如下:

(root (choice 1.title A.aaa B.bbb) (choice 2.title A.aaa) <EOF>)

思路是在OPTION_MODE中遇到非option行时,会弹出mode,现在当行首多出一个space时,它与预期不匹配。

好像是C.ccc前面的\n 匹配了NOT_OPTION_LINE导致模式弹出?我希望 C.ccc 匹配为 OPTION,谢谢。

我认为你把它弄得太复杂了。在我看来,行要么以问题 ([ \t]* [0-9]+) 开头,要么以选项 [ \t]* [A-Z] 开头。在所有其他情况下,只需忽略行 (. -> skip)。这归结为以下语法:

lexer grammar TestLexer;

QuestionStart
 : {getCharPositionInLine() == 0}? [ \t]* [0-9]+ '.' -> pushMode(ContentMode)
 ;

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.' -> pushMode(ContentMode)
 ;

Ignored
 : . -> skip
 ;

mode ContentMode;

  Content
   : ~[\r\n]+
   ;

  QuestionEnd
   : [\r\n]+ -> skip, popMode
   ;

解析器语法可能如下所示:

parser grammar TestParser;

options { tokenVocab=TestLexer; }

root
 : question+ EOF
 ;

question
 : QuestionStart Content option+
 ;

option
 : OptionStart Content+
 ;

和 Java 代码:

String source = "1.title\n" +
    "A.aaa\n" +
    "B.bbb\n" +
    " C.ccc\n" +
    "  ...ignored ...\n" +
    "2.title\n" +
    "A.aaa\n";

Lexer lexer = new TestLexer(CharStreams.fromString(source));

CommonTokenStream tokens = new CommonTokenStream(lexer);
TestParser parser = new TestParser(tokens);
ParseTree parseTree = parser.root();

System.out.println(parseTree.toStringTree(parser));

然后将打印:

(root (question 1. title (option A. aaa) (option B. bbb) (option  C. ccc)) (question 2. title (option A. aaa)) <EOF>)

编辑

鉴于您的语法中已经有特定于目标的代码,您可以 trim 像这样的选项中的空格(未经测试!):

OptionStart
 : {getCharPositionInLine() == 0}? [ \t]* [A-Z] '.'
   {setText(getText().trim());}
   -> pushMode(ContentMode)
 ;