如何编写语法来捕获内联注释,同时忽略只有注释的行?
How to write a grammar to capture inline comment while ignoring lines with just comments?
我正在为我正在开发的一种新语言编写语法。
该语言具有以下注释定义:
- 评论可以是"inline"或"only-line"评论
- "inline" 评论以
#
开头
- "only-line" 评论以
#
或 *
开头
- 每种语言语句都以
newline
结尾
- "only-line"评论可以忽略
- "inline" 注释应该被处理(值在代码生成阶段传递给 tree walker)
示例:
keyword(0x12, 0x12) # this is an inline comment
keyword(0x34, 0x34) # this is another inline comment
# this is an "only-line" comment
* this is another "only-line" comment
keyword(0x55, 0x55) # this is the 3rd inline comment
这是我实现这个目标的(简化的)语法:
statement : empty_line
| comment_statement
| keyword_statement
;
keyword_statement : 'keyword' '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;
in_line_comment : IN_LINE_COMMENT;
comment_statement : LINE_COMMENT;
empty_line : NL;
IN_LINE_COMMENT : '#' ~[\r\n]* ;
LINE_COMMENT : [#*] ~[\r\n]* -> skip;
HEX_VALUE : '0x' [0-9a-fA-F]+;
NL : '\r'? '\n' -> channel(2);
WS : [ \t]+ -> skip;
编译 Antlr4 并将示例文本输入语法生成:
[@0,0:6='keyword',<'keyword'>,1:0]
[@1,7:7='(',<'('>,1:7]
[@2,8:11='0x12',<HEX_VALUE>,1:8]
[@3,12:12=',',<','>,1:12]
[@4,14:17='0x12',<HEX_VALUE>,1:14]
[@5,18:18=')',<')'>,1:18]
[@6,20:46='# this is an inline comment',<IN_LINE_COMMENT>,1:20]
[@7,47:47='\n',<NL>,channel=2,1:47]
[@8,48:54='keyword',<'keyword'>,2:0]
[@9,55:55='(',<'('>,2:7]
[@10,56:59='0x34',<HEX_VALUE>,2:8]
[@11,60:60=',',<','>,2:12]
[@12,62:65='0x34',<HEX_VALUE>,2:14]
[@13,66:66=')',<')'>,2:18]
[@14,68:99='# this is another inline comment',<IN_LINE_COMMENT>,2:20]
[@15,100:100='\n',<NL>,channel=2,2:52]
[@16,101:101='\n',<NL>,channel=2,3:0]
[@17,102:133='# this is an "only-line" comment',<IN_LINE_COMMENT>,4:0]
[@18,134:134='\n',<NL>,channel=2,4:32]
[@19,172:172='\n',<NL>,channel=2,5:37]
[@20,173:179='keyword',<'keyword'>,6:0]
[@21,180:180='(',<'('>,6:7]
[@22,181:184='0x55',<HEX_VALUE>,6:8]
[@23,185:185=',',<','>,6:12]
[@24,187:190='0x55',<HEX_VALUE>,6:14]
[@25,191:191=')',<')'>,6:18]
[@26,193:224='# this is the 3rd inline comment',<IN_LINE_COMMENT>,6:20]
[@27,225:225='\n',<NL>,channel=2,6:52]
[@28,226:225='<EOF>',<EOF>,7:0]
line 4:0 extraneous input '# this is an "only-line" comment' expecting {<EOF>, 'keyword', LINE_COMMENT, NL}
表示以#
开头的"only-line"评论被识别为错误的LINE_COMMENT标记。
如何指示语法以不同方式处理该评论?
好的。自己挖掘并作为社区服务..
这是我的解决方案。
我在语法中使用语义谓词来解决这个问题。
该解决方案目前正在使用 Java 实现(只是为了消除 Antlr4 Python 的复杂性)——但我一定会将下面的内容翻译成 python
我修改的语法:
@lexer::members {
int in_line = 0; <-- initialize to "only-line"
}
prog : statement+ EOF;
statement : empty_line
| comment_statement
| keyword_statement
;
keyword_statement : KEYWORD '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;
in_line_comment : IN_LINE_COMMENT;
comment_statement : LINE_COMMENT;
empty_line : NL;
KEYWORD : 'keyword' {in_line = 1;};
IN_LINE_COMMENT : '#' ~[\r\n]* {in_line == 1}?; <-- will match this token only if in_line == 1 in run-time
LINE_COMMENT : [#*] ~[\r\n]* -> skip;
HEX_VALUE : '0x' [0-9a-fA-F]+;
NL : '\r'? '\n' {in_line = 0;}-> channel(2); <-- reset in_line to 0 after every statement
WS : [ \t]+ -> skip;
我正在为我正在开发的一种新语言编写语法。 该语言具有以下注释定义:
- 评论可以是"inline"或"only-line"评论
- "inline" 评论以
#
开头
- "only-line" 评论以
#
或*
开头
- 每种语言语句都以
newline
结尾
- "only-line"评论可以忽略
- "inline" 注释应该被处理(值在代码生成阶段传递给 tree walker)
示例:
keyword(0x12, 0x12) # this is an inline comment
keyword(0x34, 0x34) # this is another inline comment
# this is an "only-line" comment
* this is another "only-line" comment
keyword(0x55, 0x55) # this is the 3rd inline comment
这是我实现这个目标的(简化的)语法:
statement : empty_line
| comment_statement
| keyword_statement
;
keyword_statement : 'keyword' '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;
in_line_comment : IN_LINE_COMMENT;
comment_statement : LINE_COMMENT;
empty_line : NL;
IN_LINE_COMMENT : '#' ~[\r\n]* ;
LINE_COMMENT : [#*] ~[\r\n]* -> skip;
HEX_VALUE : '0x' [0-9a-fA-F]+;
NL : '\r'? '\n' -> channel(2);
WS : [ \t]+ -> skip;
编译 Antlr4 并将示例文本输入语法生成:
[@0,0:6='keyword',<'keyword'>,1:0]
[@1,7:7='(',<'('>,1:7]
[@2,8:11='0x12',<HEX_VALUE>,1:8]
[@3,12:12=',',<','>,1:12]
[@4,14:17='0x12',<HEX_VALUE>,1:14]
[@5,18:18=')',<')'>,1:18]
[@6,20:46='# this is an inline comment',<IN_LINE_COMMENT>,1:20]
[@7,47:47='\n',<NL>,channel=2,1:47]
[@8,48:54='keyword',<'keyword'>,2:0]
[@9,55:55='(',<'('>,2:7]
[@10,56:59='0x34',<HEX_VALUE>,2:8]
[@11,60:60=',',<','>,2:12]
[@12,62:65='0x34',<HEX_VALUE>,2:14]
[@13,66:66=')',<')'>,2:18]
[@14,68:99='# this is another inline comment',<IN_LINE_COMMENT>,2:20]
[@15,100:100='\n',<NL>,channel=2,2:52]
[@16,101:101='\n',<NL>,channel=2,3:0]
[@17,102:133='# this is an "only-line" comment',<IN_LINE_COMMENT>,4:0]
[@18,134:134='\n',<NL>,channel=2,4:32]
[@19,172:172='\n',<NL>,channel=2,5:37]
[@20,173:179='keyword',<'keyword'>,6:0]
[@21,180:180='(',<'('>,6:7]
[@22,181:184='0x55',<HEX_VALUE>,6:8]
[@23,185:185=',',<','>,6:12]
[@24,187:190='0x55',<HEX_VALUE>,6:14]
[@25,191:191=')',<')'>,6:18]
[@26,193:224='# this is the 3rd inline comment',<IN_LINE_COMMENT>,6:20]
[@27,225:225='\n',<NL>,channel=2,6:52]
[@28,226:225='<EOF>',<EOF>,7:0]
line 4:0 extraneous input '# this is an "only-line" comment' expecting {<EOF>, 'keyword', LINE_COMMENT, NL}
表示以#
开头的"only-line"评论被识别为错误的LINE_COMMENT标记。
如何指示语法以不同方式处理该评论?
好的。自己挖掘并作为社区服务..
这是我的解决方案。 我在语法中使用语义谓词来解决这个问题。 该解决方案目前正在使用 Java 实现(只是为了消除 Antlr4 Python 的复杂性)——但我一定会将下面的内容翻译成 python
我修改的语法:
@lexer::members {
int in_line = 0; <-- initialize to "only-line"
}
prog : statement+ EOF;
statement : empty_line
| comment_statement
| keyword_statement
;
keyword_statement : KEYWORD '(' HEX_VALUE ',' HEX_VALUE ')' in_line_comment?;
in_line_comment : IN_LINE_COMMENT;
comment_statement : LINE_COMMENT;
empty_line : NL;
KEYWORD : 'keyword' {in_line = 1;};
IN_LINE_COMMENT : '#' ~[\r\n]* {in_line == 1}?; <-- will match this token only if in_line == 1 in run-time
LINE_COMMENT : [#*] ~[\r\n]* -> skip;
HEX_VALUE : '0x' [0-9a-fA-F]+;
NL : '\r'? '\n' {in_line = 0;}-> channel(2); <-- reset in_line to 0 after every statement
WS : [ \t]+ -> skip;