ANTLR4 是否可以确定适用于某些位置的令牌类型?
It it possible with ANTLR4 to determine token types that applicable on some position?
我正在尝试在 ANTLR4 之上构建某种自动完成工具,但我遇到了问题(可能是我的理解)。我正在使用 ErrorListener
并尝试从 RecognitionException
对象获取适用的标记,但这种方法并非一直有效。
我有一个语法:
grammar WhereClause;
USER_NAME_COLUMN: 'user_name' ;
USER_AGE_COLUMN: 'user_age';
EQ : '=' ;
LTH : '<' ;
GTH : '>' ;
WS : ( ' ' | '\t' )+ -> skip ;
stringColumn: USER_NAME_COLUMN ;
numericColumn: USER_AGE_COLUMN;
stringRelationalOperator: EQ ;
numericRelationalOperator: EQ | LTH | GTH ;
expression: stringColumn stringRelationalOperator stringColumn | numericColumn numericRelationalOperator numericColumn ;
还有一些简单的测试:
public static void main(String... args) {
String data = "user_name = user_name";
for (int i = 1; i <= data.length(); i++) {
String input = data.substring(0, i);
System.out.println("===========================");
System.out.println(">> " + input + "");
parse(input);
}
}
private static void parse(String input) {
ANTLRInputStream inputStream = new ANTLRInputStream(input);
WhereClauseLexer lexer = new WhereClauseLexer(inputStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
WhereClauseParser parser = new WhereClauseParser(tokens);
lexer.removeErrorListeners();
parser.removeErrorListeners();
parser.addErrorListener(new ANTLRErrorListener() {
@Override
public void syntaxError(Recognizer<?, ?> recognizer, Object o, int i, int i1, String s, RecognitionException e) {
Vocabulary vocabulary = recognizer.getVocabulary();
if (e != null) {
e.getExpectedTokens().getIntervals().forEach(interval -> {
for (int j = interval.a; j <= interval.b; j++) {
System.out.println(vocabulary.getDisplayName(j));
}
});
}
}
@Override
public void reportAmbiguity(Parser parser, DFA dfa, int i, int i1, boolean b, BitSet bitSet, ATNConfigSet atnConfigSet) {}
@Override
public void reportAttemptingFullContext(Parser parser, DFA dfa, int i, int i1, BitSet bitSet, ATNConfigSet atnConfigSet) {}
@Override
public void reportContextSensitivity(Parser parser, DFA dfa, int i, int i1, int i2, ATNConfigSet atnConfigSet) {}
});
parser.expression();
}
结果我得到以下输出:
===========================
>> u
'user_name'
'user_age'
===========================
>> us
'user_name'
'user_age'
===========================
>> use
'user_name'
'user_age'
===========================
>> user
'user_name'
'user_age'
===========================
>> user_
'user_name'
'user_age'
===========================
>> user_n
'user_name'
'user_age'
===========================
>> user_na
'user_name'
'user_age'
===========================
>> user_nam
'user_name'
'user_age'
===========================
>> user_name
'='
===========================
>> user_name
'='
===========================
>> user_name =
===========================
>> user_name =
===========================
>> user_name = u
===========================
>> user_name = us
===========================
>> user_name = use
===========================
>> user_name = user
===========================
>> user_name = user_
===========================
>> user_name = user_n
===========================
>> user_name = user_na
===========================
>> user_name = user_nam
===========================
>> user_name = user_name
这意味着我没有得到预期的等式右边部分的标记。有人知道原因吗?是否可以知道输入行后面应该跟哪个记号(token)?
使用代码完成的错误位置不会很好地工作。如果插入符位置和错误位置不一致怎么办?此外,这种简单的方法只会给你预期的关键字标记,但通常你想要更多(例如给定位置的所有可用变量)。所以你需要一个符号table,你需要一种方法来确定在给定位置等处预期的符号类型
使用解析器获取代码完成候选项效果不佳。请记住,解析器访问与输入匹配的单个路径,但您需要所有可能的路径。
在this blog post I described a possible approach using ANTLR3 and I'm working on one for ANTLR4. Another attempt is published by Federico Tomassetti。它仍然只有 returns 个关键字,但至少它不为此使用解析器。
这里是 Terence Parr 关于提供 returns 全部遵循集合的函数的讨论:https://github.com/antlr/antlr4/issues/1428
我正在尝试在 ANTLR4 之上构建某种自动完成工具,但我遇到了问题(可能是我的理解)。我正在使用 ErrorListener
并尝试从 RecognitionException
对象获取适用的标记,但这种方法并非一直有效。
我有一个语法:
grammar WhereClause;
USER_NAME_COLUMN: 'user_name' ;
USER_AGE_COLUMN: 'user_age';
EQ : '=' ;
LTH : '<' ;
GTH : '>' ;
WS : ( ' ' | '\t' )+ -> skip ;
stringColumn: USER_NAME_COLUMN ;
numericColumn: USER_AGE_COLUMN;
stringRelationalOperator: EQ ;
numericRelationalOperator: EQ | LTH | GTH ;
expression: stringColumn stringRelationalOperator stringColumn | numericColumn numericRelationalOperator numericColumn ;
还有一些简单的测试:
public static void main(String... args) {
String data = "user_name = user_name";
for (int i = 1; i <= data.length(); i++) {
String input = data.substring(0, i);
System.out.println("===========================");
System.out.println(">> " + input + "");
parse(input);
}
}
private static void parse(String input) {
ANTLRInputStream inputStream = new ANTLRInputStream(input);
WhereClauseLexer lexer = new WhereClauseLexer(inputStream);
CommonTokenStream tokens = new CommonTokenStream(lexer);
WhereClauseParser parser = new WhereClauseParser(tokens);
lexer.removeErrorListeners();
parser.removeErrorListeners();
parser.addErrorListener(new ANTLRErrorListener() {
@Override
public void syntaxError(Recognizer<?, ?> recognizer, Object o, int i, int i1, String s, RecognitionException e) {
Vocabulary vocabulary = recognizer.getVocabulary();
if (e != null) {
e.getExpectedTokens().getIntervals().forEach(interval -> {
for (int j = interval.a; j <= interval.b; j++) {
System.out.println(vocabulary.getDisplayName(j));
}
});
}
}
@Override
public void reportAmbiguity(Parser parser, DFA dfa, int i, int i1, boolean b, BitSet bitSet, ATNConfigSet atnConfigSet) {}
@Override
public void reportAttemptingFullContext(Parser parser, DFA dfa, int i, int i1, BitSet bitSet, ATNConfigSet atnConfigSet) {}
@Override
public void reportContextSensitivity(Parser parser, DFA dfa, int i, int i1, int i2, ATNConfigSet atnConfigSet) {}
});
parser.expression();
}
结果我得到以下输出:
===========================
>> u
'user_name'
'user_age'
===========================
>> us
'user_name'
'user_age'
===========================
>> use
'user_name'
'user_age'
===========================
>> user
'user_name'
'user_age'
===========================
>> user_
'user_name'
'user_age'
===========================
>> user_n
'user_name'
'user_age'
===========================
>> user_na
'user_name'
'user_age'
===========================
>> user_nam
'user_name'
'user_age'
===========================
>> user_name
'='
===========================
>> user_name
'='
===========================
>> user_name =
===========================
>> user_name =
===========================
>> user_name = u
===========================
>> user_name = us
===========================
>> user_name = use
===========================
>> user_name = user
===========================
>> user_name = user_
===========================
>> user_name = user_n
===========================
>> user_name = user_na
===========================
>> user_name = user_nam
===========================
>> user_name = user_name
这意味着我没有得到预期的等式右边部分的标记。有人知道原因吗?是否可以知道输入行后面应该跟哪个记号(token)?
使用代码完成的错误位置不会很好地工作。如果插入符位置和错误位置不一致怎么办?此外,这种简单的方法只会给你预期的关键字标记,但通常你想要更多(例如给定位置的所有可用变量)。所以你需要一个符号table,你需要一种方法来确定在给定位置等处预期的符号类型
使用解析器获取代码完成候选项效果不佳。请记住,解析器访问与输入匹配的单个路径,但您需要所有可能的路径。
在this blog post I described a possible approach using ANTLR3 and I'm working on one for ANTLR4. Another attempt is published by Federico Tomassetti。它仍然只有 returns 个关键字,但至少它不为此使用解析器。
这里是 Terence Parr 关于提供 returns 全部遵循集合的函数的讨论:https://github.com/antlr/antlr4/issues/1428