"any data" 的 ANTLR4 处理延续

ANTLR4 handling continuations for "any data"

我需要创建的语法基于以下内容:

  1. 命令行以斜杠开头
  2. 命令行可以以连字符作为最后一个字符继续 (不包括空格)一行
  3. 有些命令我想解析它们的参数
  4. 对于其他命令我对其参数不感兴趣

这与以下(简化的)词法分析器几乎可以正常工作

lexer grammar T1Lexer;

NewLine
 : [\r\n]+ -> skip
 ;

CommandStart
 : '/' -> pushMode(CommandMode)
 ;

DataStart
 : . -> more, pushMode(DataMode)
 ;

mode DataMode;

 DataLine
  : ~[\r\n]+ -> popMode
  ;

mode CommandMode;

  CmNL
  : [\r\n]+ -> skip, popMode
  ;

  CONTINUEMINUS :     ( '-' [ ]* ('\r/' | '\n/' | '\r\n/') ) -> channel(HIDDEN);
  EOL: ( [ ]* ('\r' | '\n' | '\r\n') ) -> popMode;

  SPACE :        [ \t\r\n]+    -> channel(HIDDEN) ;
  DOT :          [.] ;
  COMMA :        ',' ;

  CMD1 :        'CMD1';
  CMD2 :        'CMD2';
  CMDIGN :      'CMDIGN' -> pushMode(DataMode) ;

  VAR1 :        'VAR1=' ;

  ID :                  ID_LITERAL;

fragment ID_LITERAL:                 [A-Z_[=11=]-9]*?[A-Z_$]+?[A-Z_[=11=]-9]*;

和解析器:

parser grammar T1Parser;

options { tokenVocab=T1Lexer; }

root :  line+ EOF   ;

line:   ( commandLine | dataLine)+  ;

dataLine :  DataLine    ;

commandLine :   CommandStart command    ;

command : cmd1 | cmd2 | cmdign ;

cmd1 :    CMD1 (VAR1 ID)+ ;
cmd2 :    CMD2 (VAR1 ID)+ ;
cmdign :    CMDIGN DataLine ;

问题出现在我需要 2. + 4. 的组合的地方,即我想简单地将参数作为未解析的字符串获取参数的命令的延续(示例中的第 5+6 行)。

当我在第 5 行按 CMDIGN 的 DataMode 时,连续字符无法识别,因为它被“任何直到 EOL”规则吞没,所以我返回默认模式并且连续行被认为是一个新命令无法解析。

有没有办法正确处理这个组合?

TIA - 亚历克斯

(举个例子)你真的不需要 CommandMode;它实际上使事情变得有点复杂。

T1Lexer.g4:

lexer grammar T1Lexer
    ;

CMD_START: '/';

CONTINUE_EOL_SLASH: '-' EOL_F '/' -> channel(HIDDEN);
EOL:                EOL_F;

WS:    [ \t]+ -> channel(HIDDEN);
DOT:   [.];
COMMA: ',';

CMD1:   'CMD1';
CMD2:   'CMD2';
CMDIGN: 'CMDIGN' -> pushMode(DataMode);

VAR1: 'VAR1=';

ID: ID_LITERAL;

//=======================================
mode DataMode
    ;

DM_EOL:    EOL_F -> type(EOL), popMode;
DATA_LINE: ( ~[\r\n]*? '-' EOL_F)* ~[\r\n]+;

//=======================================
fragment NL:         '\r'? '\n';
fragment EOL_F:      [ ]* NL;
fragment ID_LITERAL: [A-Z_[=10=]-9]*? [A-Z_$]+? [A-Z_[=10=]-9]*;

T1Parser.g4

parser grammar T1Parser
    ;

options {
    tokenVocab = T1Lexer;
}

root: line (EOL line)* EOL? EOF;

line: commandLine | dataLine | emptyLine;

dataLine: DATA_LINE;

commandLine: CMD_START command;

emptyLine: CMD_START;

command: cmd1 | cmd2 | cmdign;

cmd1:   CMD1 (VAR1 ID)+;
cmd2:   CMD2 (VAR1 ID)+;
cmdign: CMDIGN DATA_LINE?;

测试输入:

/ CMD1 VAR1=VAL1 VAR1=VAL2
/ CMDIGN VAR1=BLAH VAR2=BLAH
/ CMD2 VAR1=VAL12 -
/      VAR1=VAL22
/ CMDIGN
/
/ CMDIGN VAR-1=0 -   
/        VAR2=notignored

令牌流:

[@0,0:0='/',<'/'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='CMD1',<'CMD1'>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:11='VAR1=',<'VAR1='>,1:7]
[@5,12:15='VAL1',<ID>,1:12]
[@6,16:16=' ',<WS>,channel=1,1:16]
[@7,17:21='VAR1=',<'VAR1='>,1:17]
[@8,22:25='VAL2',<ID>,1:22]
[@9,26:26='\n',<EOL>,1:26]
[@10,27:27='/',<'/'>,2:0]
[@11,28:28=' ',<WS>,channel=1,2:1]
[@12,29:34='CMDIGN',<'CMDIGN'>,2:2]
[@13,35:54=' VAR1=BLAH VAR2=BLAH',<DATA_LINE>,2:8]
[@14,55:55='\n',<EOL>,2:28]
[@15,56:56='/',<'/'>,3:0]
[@16,57:57=' ',<WS>,channel=1,3:1]
[@17,58:61='CMD2',<'CMD2'>,3:2]
[@18,62:62=' ',<WS>,channel=1,3:6]
[@19,63:67='VAR1=',<'VAR1='>,3:7]
[@20,68:72='VAL12',<ID>,3:12]
[@21,73:73=' ',<WS>,channel=1,3:17]
[@22,74:76='-\n/',<CONTINUE_EOL_SLASH>,channel=1,3:18]
[@23,77:82='      ',<WS>,channel=1,4:1]
[@24,83:87='VAR1=',<'VAR1='>,4:7]
[@25,88:92='VAL22',<ID>,4:12]
[@26,93:93='\n',<EOL>,4:17]
[@27,94:94='/',<'/'>,5:0]
[@28,95:95=' ',<WS>,channel=1,5:1]
[@29,96:101='CMDIGN',<'CMDIGN'>,5:2]
[@30,102:102='\n',<EOL>,5:8]
[@31,103:103='/',<'/'>,6:0]
[@32,104:104='\n',<EOL>,6:1]
[@33,105:105='/',<'/'>,7:0]
[@34,106:106=' ',<WS>,channel=1,7:1]
[@35,107:112='CMDIGN',<'CMDIGN'>,7:2]
[@36,113:150=' VAR-1=0 -   \n/     

树输出:

(root 
  (line 
    (commandLine 
      / 
      (command 
        (cmd1 CMD1 VAR1= VAL1 VAR1= VAL2)
      )
    )
  ) 
  \n 
  (line 
    (commandLine 
      / 
      (command 
        (cmdign CMDIGN  VAR1=BLAH VAR2=BLAH)
      )
     )
    ) 
  \n 
  (line 
    (commandLine 
      / 
      (command 
        (cmd2 CMD2 VAR1= VAL12 VAR1= VAL22)
      )
    )
  ) 
  \n 
  (line 
    (commandLine 
      / 
      (command 
        (cmdign CMDIGN)
      )
    )
  ) 
  \n 
  (line 
    (emptyLine /)
  ) 
  \n 
  (line 
    (commandLine 
      / 
      (command 
        (cmdign CMDIGN  VAR-1=0 -   \n/        VAR2=notignored)
      )
    )
  ) 
  <EOF>
)