"any data" 的 ANTLR4 处理延续
ANTLR4 handling continuations for "any data"
我需要创建的语法基于以下内容:
- 命令行以斜杠开头
- 命令行可以以连字符作为最后一个字符继续
(不包括空格)一行
- 有些命令我想解析它们的参数
- 对于其他命令我对其参数不感兴趣
这与以下(简化的)词法分析器几乎可以正常工作
lexer grammar T1Lexer;
NewLine
: [\r\n]+ -> skip
;
CommandStart
: '/' -> pushMode(CommandMode)
;
DataStart
: . -> more, pushMode(DataMode)
;
mode DataMode;
DataLine
: ~[\r\n]+ -> popMode
;
mode CommandMode;
CmNL
: [\r\n]+ -> skip, popMode
;
CONTINUEMINUS : ( '-' [ ]* ('\r/' | '\n/' | '\r\n/') ) -> channel(HIDDEN);
EOL: ( [ ]* ('\r' | '\n' | '\r\n') ) -> popMode;
SPACE : [ \t\r\n]+ -> channel(HIDDEN) ;
DOT : [.] ;
COMMA : ',' ;
CMD1 : 'CMD1';
CMD2 : 'CMD2';
CMDIGN : 'CMDIGN' -> pushMode(DataMode) ;
VAR1 : 'VAR1=' ;
ID : ID_LITERAL;
fragment ID_LITERAL: [A-Z_[=11=]-9]*?[A-Z_$]+?[A-Z_[=11=]-9]*;
和解析器:
parser grammar T1Parser;
options { tokenVocab=T1Lexer; }
root : line+ EOF ;
line: ( commandLine | dataLine)+ ;
dataLine : DataLine ;
commandLine : CommandStart command ;
command : cmd1 | cmd2 | cmdign ;
cmd1 : CMD1 (VAR1 ID)+ ;
cmd2 : CMD2 (VAR1 ID)+ ;
cmdign : CMDIGN DataLine ;
问题出现在我需要 2. + 4. 的组合的地方,即我想简单地将参数作为未解析的字符串获取参数的命令的延续(示例中的第 5+6 行)。
当我在第 5 行按 CMDIGN 的 DataMode 时,连续字符无法识别,因为它被“任何直到 EOL”规则吞没,所以我返回默认模式并且连续行被认为是一个新命令无法解析。
有没有办法正确处理这个组合?
TIA - 亚历克斯
(举个例子)你真的不需要 CommandMode
;它实际上使事情变得有点复杂。
T1Lexer.g4:
lexer grammar T1Lexer
;
CMD_START: '/';
CONTINUE_EOL_SLASH: '-' EOL_F '/' -> channel(HIDDEN);
EOL: EOL_F;
WS: [ \t]+ -> channel(HIDDEN);
DOT: [.];
COMMA: ',';
CMD1: 'CMD1';
CMD2: 'CMD2';
CMDIGN: 'CMDIGN' -> pushMode(DataMode);
VAR1: 'VAR1=';
ID: ID_LITERAL;
//=======================================
mode DataMode
;
DM_EOL: EOL_F -> type(EOL), popMode;
DATA_LINE: ( ~[\r\n]*? '-' EOL_F)* ~[\r\n]+;
//=======================================
fragment NL: '\r'? '\n';
fragment EOL_F: [ ]* NL;
fragment ID_LITERAL: [A-Z_[=10=]-9]*? [A-Z_$]+? [A-Z_[=10=]-9]*;
T1Parser.g4
parser grammar T1Parser
;
options {
tokenVocab = T1Lexer;
}
root: line (EOL line)* EOL? EOF;
line: commandLine | dataLine | emptyLine;
dataLine: DATA_LINE;
commandLine: CMD_START command;
emptyLine: CMD_START;
command: cmd1 | cmd2 | cmdign;
cmd1: CMD1 (VAR1 ID)+;
cmd2: CMD2 (VAR1 ID)+;
cmdign: CMDIGN DATA_LINE?;
测试输入:
/ CMD1 VAR1=VAL1 VAR1=VAL2
/ CMDIGN VAR1=BLAH VAR2=BLAH
/ CMD2 VAR1=VAL12 -
/ VAR1=VAL22
/ CMDIGN
/
/ CMDIGN VAR-1=0 -
/ VAR2=notignored
令牌流:
[@0,0:0='/',<'/'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='CMD1',<'CMD1'>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:11='VAR1=',<'VAR1='>,1:7]
[@5,12:15='VAL1',<ID>,1:12]
[@6,16:16=' ',<WS>,channel=1,1:16]
[@7,17:21='VAR1=',<'VAR1='>,1:17]
[@8,22:25='VAL2',<ID>,1:22]
[@9,26:26='\n',<EOL>,1:26]
[@10,27:27='/',<'/'>,2:0]
[@11,28:28=' ',<WS>,channel=1,2:1]
[@12,29:34='CMDIGN',<'CMDIGN'>,2:2]
[@13,35:54=' VAR1=BLAH VAR2=BLAH',<DATA_LINE>,2:8]
[@14,55:55='\n',<EOL>,2:28]
[@15,56:56='/',<'/'>,3:0]
[@16,57:57=' ',<WS>,channel=1,3:1]
[@17,58:61='CMD2',<'CMD2'>,3:2]
[@18,62:62=' ',<WS>,channel=1,3:6]
[@19,63:67='VAR1=',<'VAR1='>,3:7]
[@20,68:72='VAL12',<ID>,3:12]
[@21,73:73=' ',<WS>,channel=1,3:17]
[@22,74:76='-\n/',<CONTINUE_EOL_SLASH>,channel=1,3:18]
[@23,77:82=' ',<WS>,channel=1,4:1]
[@24,83:87='VAR1=',<'VAR1='>,4:7]
[@25,88:92='VAL22',<ID>,4:12]
[@26,93:93='\n',<EOL>,4:17]
[@27,94:94='/',<'/'>,5:0]
[@28,95:95=' ',<WS>,channel=1,5:1]
[@29,96:101='CMDIGN',<'CMDIGN'>,5:2]
[@30,102:102='\n',<EOL>,5:8]
[@31,103:103='/',<'/'>,6:0]
[@32,104:104='\n',<EOL>,6:1]
[@33,105:105='/',<'/'>,7:0]
[@34,106:106=' ',<WS>,channel=1,7:1]
[@35,107:112='CMDIGN',<'CMDIGN'>,7:2]
[@36,113:150=' VAR-1=0 - \n/
树输出:
(root
(line
(commandLine
/
(command
(cmd1 CMD1 VAR1= VAL1 VAR1= VAL2)
)
)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN VAR1=BLAH VAR2=BLAH)
)
)
)
\n
(line
(commandLine
/
(command
(cmd2 CMD2 VAR1= VAL12 VAR1= VAL22)
)
)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN)
)
)
)
\n
(line
(emptyLine /)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN VAR-1=0 - \n/ VAR2=notignored)
)
)
)
<EOF>
)
我需要创建的语法基于以下内容:
- 命令行以斜杠开头
- 命令行可以以连字符作为最后一个字符继续 (不包括空格)一行
- 有些命令我想解析它们的参数
- 对于其他命令我对其参数不感兴趣
这与以下(简化的)词法分析器几乎可以正常工作
lexer grammar T1Lexer;
NewLine
: [\r\n]+ -> skip
;
CommandStart
: '/' -> pushMode(CommandMode)
;
DataStart
: . -> more, pushMode(DataMode)
;
mode DataMode;
DataLine
: ~[\r\n]+ -> popMode
;
mode CommandMode;
CmNL
: [\r\n]+ -> skip, popMode
;
CONTINUEMINUS : ( '-' [ ]* ('\r/' | '\n/' | '\r\n/') ) -> channel(HIDDEN);
EOL: ( [ ]* ('\r' | '\n' | '\r\n') ) -> popMode;
SPACE : [ \t\r\n]+ -> channel(HIDDEN) ;
DOT : [.] ;
COMMA : ',' ;
CMD1 : 'CMD1';
CMD2 : 'CMD2';
CMDIGN : 'CMDIGN' -> pushMode(DataMode) ;
VAR1 : 'VAR1=' ;
ID : ID_LITERAL;
fragment ID_LITERAL: [A-Z_[=11=]-9]*?[A-Z_$]+?[A-Z_[=11=]-9]*;
和解析器:
parser grammar T1Parser;
options { tokenVocab=T1Lexer; }
root : line+ EOF ;
line: ( commandLine | dataLine)+ ;
dataLine : DataLine ;
commandLine : CommandStart command ;
command : cmd1 | cmd2 | cmdign ;
cmd1 : CMD1 (VAR1 ID)+ ;
cmd2 : CMD2 (VAR1 ID)+ ;
cmdign : CMDIGN DataLine ;
问题出现在我需要 2. + 4. 的组合的地方,即我想简单地将参数作为未解析的字符串获取参数的命令的延续(示例中的第 5+6 行)。
当我在第 5 行按 CMDIGN 的 DataMode 时,连续字符无法识别,因为它被“任何直到 EOL”规则吞没,所以我返回默认模式并且连续行被认为是一个新命令无法解析。
有没有办法正确处理这个组合?
TIA - 亚历克斯
(举个例子)你真的不需要 CommandMode
;它实际上使事情变得有点复杂。
T1Lexer.g4:
lexer grammar T1Lexer
;
CMD_START: '/';
CONTINUE_EOL_SLASH: '-' EOL_F '/' -> channel(HIDDEN);
EOL: EOL_F;
WS: [ \t]+ -> channel(HIDDEN);
DOT: [.];
COMMA: ',';
CMD1: 'CMD1';
CMD2: 'CMD2';
CMDIGN: 'CMDIGN' -> pushMode(DataMode);
VAR1: 'VAR1=';
ID: ID_LITERAL;
//=======================================
mode DataMode
;
DM_EOL: EOL_F -> type(EOL), popMode;
DATA_LINE: ( ~[\r\n]*? '-' EOL_F)* ~[\r\n]+;
//=======================================
fragment NL: '\r'? '\n';
fragment EOL_F: [ ]* NL;
fragment ID_LITERAL: [A-Z_[=10=]-9]*? [A-Z_$]+? [A-Z_[=10=]-9]*;
T1Parser.g4
parser grammar T1Parser
;
options {
tokenVocab = T1Lexer;
}
root: line (EOL line)* EOL? EOF;
line: commandLine | dataLine | emptyLine;
dataLine: DATA_LINE;
commandLine: CMD_START command;
emptyLine: CMD_START;
command: cmd1 | cmd2 | cmdign;
cmd1: CMD1 (VAR1 ID)+;
cmd2: CMD2 (VAR1 ID)+;
cmdign: CMDIGN DATA_LINE?;
测试输入:
/ CMD1 VAR1=VAL1 VAR1=VAL2
/ CMDIGN VAR1=BLAH VAR2=BLAH
/ CMD2 VAR1=VAL12 -
/ VAR1=VAL22
/ CMDIGN
/
/ CMDIGN VAR-1=0 -
/ VAR2=notignored
令牌流:
[@0,0:0='/',<'/'>,1:0]
[@1,1:1=' ',<WS>,channel=1,1:1]
[@2,2:5='CMD1',<'CMD1'>,1:2]
[@3,6:6=' ',<WS>,channel=1,1:6]
[@4,7:11='VAR1=',<'VAR1='>,1:7]
[@5,12:15='VAL1',<ID>,1:12]
[@6,16:16=' ',<WS>,channel=1,1:16]
[@7,17:21='VAR1=',<'VAR1='>,1:17]
[@8,22:25='VAL2',<ID>,1:22]
[@9,26:26='\n',<EOL>,1:26]
[@10,27:27='/',<'/'>,2:0]
[@11,28:28=' ',<WS>,channel=1,2:1]
[@12,29:34='CMDIGN',<'CMDIGN'>,2:2]
[@13,35:54=' VAR1=BLAH VAR2=BLAH',<DATA_LINE>,2:8]
[@14,55:55='\n',<EOL>,2:28]
[@15,56:56='/',<'/'>,3:0]
[@16,57:57=' ',<WS>,channel=1,3:1]
[@17,58:61='CMD2',<'CMD2'>,3:2]
[@18,62:62=' ',<WS>,channel=1,3:6]
[@19,63:67='VAR1=',<'VAR1='>,3:7]
[@20,68:72='VAL12',<ID>,3:12]
[@21,73:73=' ',<WS>,channel=1,3:17]
[@22,74:76='-\n/',<CONTINUE_EOL_SLASH>,channel=1,3:18]
[@23,77:82=' ',<WS>,channel=1,4:1]
[@24,83:87='VAR1=',<'VAR1='>,4:7]
[@25,88:92='VAL22',<ID>,4:12]
[@26,93:93='\n',<EOL>,4:17]
[@27,94:94='/',<'/'>,5:0]
[@28,95:95=' ',<WS>,channel=1,5:1]
[@29,96:101='CMDIGN',<'CMDIGN'>,5:2]
[@30,102:102='\n',<EOL>,5:8]
[@31,103:103='/',<'/'>,6:0]
[@32,104:104='\n',<EOL>,6:1]
[@33,105:105='/',<'/'>,7:0]
[@34,106:106=' ',<WS>,channel=1,7:1]
[@35,107:112='CMDIGN',<'CMDIGN'>,7:2]
[@36,113:150=' VAR-1=0 - \n/
树输出:
(root
(line
(commandLine
/
(command
(cmd1 CMD1 VAR1= VAL1 VAR1= VAL2)
)
)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN VAR1=BLAH VAR2=BLAH)
)
)
)
\n
(line
(commandLine
/
(command
(cmd2 CMD2 VAR1= VAL12 VAR1= VAL22)
)
)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN)
)
)
)
\n
(line
(emptyLine /)
)
\n
(line
(commandLine
/
(command
(cmdign CMDIGN VAR-1=0 - \n/ VAR2=notignored)
)
)
)
<EOF>
)