如何使用 ANTLR4 解析速度变量

Question

Velocity变量有如下表示法。（参见 Velocity User Guide）：

The shorthand notation of a variable consists of a leading "$" character followed by a VTL Identifier. A VTL Identifier must start with an alphabetic character (a .. z or A .. Z). The rest of the characters are limited to the following types of characters:

alphabetic (a .. z, A .. Z)

numeric (0 .. 9)

underscore ("_")

我想使用词法分析器模式来拆分普通文本和变量，所以我写了这样的东西：

// default mode
DOLLAR : ‘$’ -> pushMode(VARIABLE);
TEXT : ~[$]+? -> skip;

mode VARIABLE:
ID : [a-zA-Z] [a-zA-Z0-9-_]*;
???? : XXX -> popMode;   // how can I pop mode to default?

因为变量的表示法没有明确的结束字符，所以我不知道如何确定它的结束。

也许我听错了？

Answer 1

你会像这样跳出那个范围：

mode VARIABLE;
  ID  : [a-zA-Z] [a-zA-Z0-9-_]* -> popMode;

这是一个快速演示：

lexer grammar VelocityLexer;

DOLLAR : '$' -> more, pushMode(VARIABLE);
TEXT   : ~[$]+ -> skip;

mode VARIABLE;
  // the `-` needs to be escaped!
  ID : [a-zA-Z] [a-zA-Z0-9\-_]* -> popMode;

注意 DOLLAR 中的 more，这将导致 $ 包含在 ID 标记中。如果不这样做，您最终会得到两个标记（$ 和 foo 输入 $foo）

使用以下内容测试语法 Java class:

import org.antlr.v4.runtime.*;

public class Main {

  public static void main(String[] args) {

    VelocityLexer lexer = new VelocityLexer(CharStreams.fromString("<strong>$Mu</strong>$foo..."));
    CommonTokenStream tokenStream = new CommonTokenStream(lexer);
    tokenStream.fill();

    for (Token t : tokenStream.getTokens()) {
      System.out.printf("%-10s '%s'\n", VelocityLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
    }
  }
}

这将打印：

ID         '$Mu'
ID         '$foo'
EOF        '<EOF>'

但是，我认为在 ID 的情况下，词法模式不是一个好的选择。为什么不简单地做：

lexer grammar VelocityLexer;

DOLLAR : '$' [a-zA-Z] [a-zA-Z0-9\-_]*;
TEXT   : ~[$]+ -> skip;

?

如何使用 ANTLR4 解析速度变量

How to parsing Velocity Variables using ANTLR4

parsing

velocity

antlr4