迭代隐藏通道中的标记
Iterating over tokens in HIDDEN channel
我目前正在为自定义的、非常类似于 lua 的脚本语言 MobTalkerScript (MTS), which provides me with an ANTLR4 lexer 创建一个 IDE。由于 MTS 语言文件的规范将注释放入 HIDDEN_CHANNEL
通道,我需要告诉词法分析器实际从 HIDDEN_CHANNEL
通道读取。这就是我尝试这样做的方式。
Mts3Lexer lexer = new Mts3Lexer(new ANTLRInputStream("<replace this with the input>"));
lexer.setTokenFactory(new CommonTokenFactory(false));
lexer.setChannel(Token.HIDDEN_CHANNEL);
Token token = lexer.emit();
int type = token.getType();
do {
switch(type) {
case Mts3Lexer.LINE_COMMENT:
case Mts3Lexer.COMMENT:
System.out.println("token "+token.getText()+" is a comment");
default:
System.out.println("token "+token.getText()+" is not a comment");
}
} while((token = lexer.nextToken()) != null && (type = token.getType()) != Token.EOF);
现在,如果我在以下输入中使用此代码,控制台只会打印 token ... is not a comment
。
function foo()
-- this should be a single-line comment
something = "blah"
--[[ this should
be a multi-line
comment ]]--
end
不过,包含评论的标记永远不会出现。于是找了下这个问题的根源,在ANTLR4 Lexer
class:
中找到了如下方法
/** Return a token from this source; i.e., match a token on the char
* stream.
*/
@Override
public Token nextToken() {
if (_input == null) {
throw new IllegalStateException("nextToken requires a non-null input stream.");
}
// Mark start location in char stream so unbuffered streams are
// guaranteed at least have text of current token
int tokenStartMarker = _input.mark();
try{
outer:
while (true) {
if (_hitEOF) {
emitEOF();
return _token;
}
_token = null;
_channel = Token.DEFAULT_CHANNEL;
_tokenStartCharIndex = _input.index();
_tokenStartCharPositionInLine = getInterpreter().getCharPositionInLine();
_tokenStartLine = getInterpreter().getLine();
_text = null;
do {
_type = Token.INVALID_TYPE;
// System.out.println("nextToken line "+tokenStartLine+" at "+((char)input.LA(1))+
// " in mode "+mode+
// " at index "+input.index());
int ttype;
try {
ttype = getInterpreter().match(_input, _mode);
}
catch (LexerNoViableAltException e) {
notifyListeners(e); // report error
recover(e);
ttype = SKIP;
}
if ( _input.LA(1)==IntStream.EOF ) {
_hitEOF = true;
}
if ( _type == Token.INVALID_TYPE ) _type = ttype;
if ( _type ==SKIP ) {
continue outer;
}
} while ( _type ==MORE );
if ( _token == null ) emit();
return _token;
}
}
finally {
// make sure we release marker after match or
// unbuffered char stream will keep buffering
_input.release(tokenStartMarker);
}
}
引起我注意的行如下。
_channel = Token.DEFAULT_CHANNEL;
我不太了解 ANTLR,但显然这一行将词法分析器保留在 DEFAULT_CHANNEL
频道中。
我尝试从 HIDDEN_CHANNEL
频道读取的方式是否正确,或者我不能将 nextToken()
与隐藏频道一起使用?
我发现为什么词法分析器没有给我任何包含评论的标记 - 我似乎错过了语法文件跳过评论而不是将它们放入隐藏通道。联系作者,改了语法文件,现在可以了
自我提醒:多注意阅读内容。
对于 Go (golang),这段代码适合我:
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
)
type antlrparser interface {
GetParser() antlr.Parser
}
func fullText(prc antlr.ParserRuleContext) string {
p := prc.(antlrparser).GetParser()
ts := p.GetTokenStream()
tx := ts.GetTextFromTokens(prc.GetStart(), prc.GetStop())
return tx
}
只需将您的 ctx.GetSomething()
传递给 fullText
。当然,如上图,空格要到*.g4
文件中的隐藏通道:
WS: [ \t\r\n] -> channel(HIDDEN);
我目前正在为自定义的、非常类似于 lua 的脚本语言 MobTalkerScript (MTS), which provides me with an ANTLR4 lexer 创建一个 IDE。由于 MTS 语言文件的规范将注释放入 HIDDEN_CHANNEL
通道,我需要告诉词法分析器实际从 HIDDEN_CHANNEL
通道读取。这就是我尝试这样做的方式。
Mts3Lexer lexer = new Mts3Lexer(new ANTLRInputStream("<replace this with the input>"));
lexer.setTokenFactory(new CommonTokenFactory(false));
lexer.setChannel(Token.HIDDEN_CHANNEL);
Token token = lexer.emit();
int type = token.getType();
do {
switch(type) {
case Mts3Lexer.LINE_COMMENT:
case Mts3Lexer.COMMENT:
System.out.println("token "+token.getText()+" is a comment");
default:
System.out.println("token "+token.getText()+" is not a comment");
}
} while((token = lexer.nextToken()) != null && (type = token.getType()) != Token.EOF);
现在,如果我在以下输入中使用此代码,控制台只会打印 token ... is not a comment
。
function foo()
-- this should be a single-line comment
something = "blah"
--[[ this should
be a multi-line
comment ]]--
end
不过,包含评论的标记永远不会出现。于是找了下这个问题的根源,在ANTLR4 Lexer
class:
/** Return a token from this source; i.e., match a token on the char
* stream.
*/
@Override
public Token nextToken() {
if (_input == null) {
throw new IllegalStateException("nextToken requires a non-null input stream.");
}
// Mark start location in char stream so unbuffered streams are
// guaranteed at least have text of current token
int tokenStartMarker = _input.mark();
try{
outer:
while (true) {
if (_hitEOF) {
emitEOF();
return _token;
}
_token = null;
_channel = Token.DEFAULT_CHANNEL;
_tokenStartCharIndex = _input.index();
_tokenStartCharPositionInLine = getInterpreter().getCharPositionInLine();
_tokenStartLine = getInterpreter().getLine();
_text = null;
do {
_type = Token.INVALID_TYPE;
// System.out.println("nextToken line "+tokenStartLine+" at "+((char)input.LA(1))+
// " in mode "+mode+
// " at index "+input.index());
int ttype;
try {
ttype = getInterpreter().match(_input, _mode);
}
catch (LexerNoViableAltException e) {
notifyListeners(e); // report error
recover(e);
ttype = SKIP;
}
if ( _input.LA(1)==IntStream.EOF ) {
_hitEOF = true;
}
if ( _type == Token.INVALID_TYPE ) _type = ttype;
if ( _type ==SKIP ) {
continue outer;
}
} while ( _type ==MORE );
if ( _token == null ) emit();
return _token;
}
}
finally {
// make sure we release marker after match or
// unbuffered char stream will keep buffering
_input.release(tokenStartMarker);
}
}
引起我注意的行如下。
_channel = Token.DEFAULT_CHANNEL;
我不太了解 ANTLR,但显然这一行将词法分析器保留在 DEFAULT_CHANNEL
频道中。
我尝试从 HIDDEN_CHANNEL
频道读取的方式是否正确,或者我不能将 nextToken()
与隐藏频道一起使用?
我发现为什么词法分析器没有给我任何包含评论的标记 - 我似乎错过了语法文件跳过评论而不是将它们放入隐藏通道。联系作者,改了语法文件,现在可以了
自我提醒:多注意阅读内容。
对于 Go (golang),这段代码适合我:
import (
"github.com/antlr/antlr4/runtime/Go/antlr"
)
type antlrparser interface {
GetParser() antlr.Parser
}
func fullText(prc antlr.ParserRuleContext) string {
p := prc.(antlrparser).GetParser()
ts := p.GetTokenStream()
tx := ts.GetTextFromTokens(prc.GetStart(), prc.GetStop())
return tx
}
只需将您的 ctx.GetSomething()
传递给 fullText
。当然,如上图,空格要到*.g4
文件中的隐藏通道:
WS: [ \t\r\n] -> channel(HIDDEN);