ANTLR4 Lexer getTokens() 返回 0 个标记
ANTLR4 Lexer getTokens() returning 0 tokens
我是 运行 来自这里的代码:https://github.com/bkiers/antlr4-csv-demo。
我想通过添加此行来查看词法分析器分析的标记:
System.out.println("Number of tokens: " + tokens.getTokens().size())
到Main.java:
public static void main(String[] args) throws Exception {
// the input source
String source =
"aaa,bbb,ccc" + "\n" +
"\"d,\"\"d\",eee,fff";
// create an instance of the lexer
CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// look at tokens analyzed
System.out.println("Number of tokens: " + tokens.getTokens().size())
// create the parser
CSVParser parser = new CSVParser(tokens);
// invoke the entry point of our grammar
List<List<String>> data = parser.file().data;
// display the contents of the CSV source
for(int r = 0; r < data.size(); r++) {
List<String> row = data.get(r);
for(int c = 0; c < row.size(); c++) {
System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));
}
}
}
打印出来的结果是:Number of tokens: 0
。为什么 getTokens()
返回的列表是空的?解析器代码的其余部分 returns 数据完全没问题。
编辑:所以使用 lexer.getAllTokens()
是可行的,但为什么 CommonTokenStream
没有返回正确的标记?
csv.g4:
grammar CSV;
@header {
package csv;
}
file returns [List<List<String>> data]
@init {$data = new ArrayList<List<String>>();}
: (row {$data.add($row.list);})+ EOF
;
row returns [List<String> list]
@init {$list = new ArrayList<String>();}
: a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)
;
value returns [String val]
: SimpleValue {$val = $SimpleValue.text;}
| QuotedValue
{
$val = $QuotedValue.text;
$val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes
$val = $val.replace("\"\"", "\""); // replace all `""` with `"`
}
;
Comma
: ','
;
LineBreak
: '\r'? '\n'
| '\r'
;
SimpleValue
: ~[,\r\n"]+
;
QuotedValue
: '"' ('""' | ~'"')* '"'
;
通常,解析器负责启动输入流的词法分析。要手动启动词法分析,请调用 CommonTokenStream.fill()(在 BufferedTokenStream 中实现)。
我是 运行 来自这里的代码:https://github.com/bkiers/antlr4-csv-demo。 我想通过添加此行来查看词法分析器分析的标记:
System.out.println("Number of tokens: " + tokens.getTokens().size())
到Main.java:
public static void main(String[] args) throws Exception {
// the input source
String source =
"aaa,bbb,ccc" + "\n" +
"\"d,\"\"d\",eee,fff";
// create an instance of the lexer
CSVLexer lexer = new CSVLexer(new ANTLRInputStream(source));
// wrap a token-stream around the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// look at tokens analyzed
System.out.println("Number of tokens: " + tokens.getTokens().size())
// create the parser
CSVParser parser = new CSVParser(tokens);
// invoke the entry point of our grammar
List<List<String>> data = parser.file().data;
// display the contents of the CSV source
for(int r = 0; r < data.size(); r++) {
List<String> row = data.get(r);
for(int c = 0; c < row.size(); c++) {
System.out.println("(row=" + (r+1) + ",col=" + (c+1) + ") = " + row.get(c));
}
}
}
打印出来的结果是:Number of tokens: 0
。为什么 getTokens()
返回的列表是空的?解析器代码的其余部分 returns 数据完全没问题。
编辑:所以使用 lexer.getAllTokens()
是可行的,但为什么 CommonTokenStream
没有返回正确的标记?
csv.g4:
grammar CSV;
@header {
package csv;
}
file returns [List<List<String>> data]
@init {$data = new ArrayList<List<String>>();}
: (row {$data.add($row.list);})+ EOF
;
row returns [List<String> list]
@init {$list = new ArrayList<String>();}
: a=value {$list.add($a.val);} (Comma b=value {$list.add($b.val);})* (LineBreak | EOF)
;
value returns [String val]
: SimpleValue {$val = $SimpleValue.text;}
| QuotedValue
{
$val = $QuotedValue.text;
$val = $val.substring(1, $val.length()-1); // remove leading- and trailing quotes
$val = $val.replace("\"\"", "\""); // replace all `""` with `"`
}
;
Comma
: ','
;
LineBreak
: '\r'? '\n'
| '\r'
;
SimpleValue
: ~[,\r\n"]+
;
QuotedValue
: '"' ('""' | ~'"')* '"'
;
通常,解析器负责启动输入流的词法分析。要手动启动词法分析,请调用 CommonTokenStream.fill()(在 BufferedTokenStream 中实现)。