为什么从 Antlr 3 升级到 Antlr 4 后解析失败?

Why parse failing after upgrading from Antlr 3 to Antlr 4?

最近我正在尝试将我的项目从 Antlr3 升级到 Antlr4。但是在对语法文件进行更改后,以前有效的方程式似乎不再有效。我是 Antlr4 的新手,所以无法理解我的更改是否破坏了某些东西。

这是我的原始语法文件:

grammar equation;
options {
    language=CSharp2;
    output=AST;
    ASTLabelType=CommonTree;
}   

tokens {
    VARIABLE;  
    CONSTANT;  
    EXPR;
    PAREXPR;
    EQUATION;
    UNARYEXPR;
    FUNCTION;
    BINARYOP;
    LIST;
}


equationset:    equation* EOF!;
equation:   variable ASSIGN expression -> ^(EQUATION variable expression)
    ;

parExpression 
    :   LPAREN expression RPAREN -> ^(PAREXPR expression)
    ;

expression
    :   conditionalexpression -> ^(EXPR conditionalexpression)
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR^ andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND^ comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ^ | NE^ | LTE^ | GTE^ | LT^ | GT^) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS^ | MINUS^) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES^ | DIVIDE^) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression -> ^(UNARYEXPR NOT unaryExpression)
    |   MINUS unaryExpression  -> ^(UNARYEXPR MINUS unaryExpression)
    | exponentexpression;

exponentexpression
    :   primary (CARET^ primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING -> ^(CONSTANT STRING) | numeric -> ^(CONSTANT numeric);
booleantok  :   BOOLEAN -> ^(BOOLEAN);
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER -> IDENTIFIER+;
function
    :   scopedidentifier LPAREN argumentlist RPAREN -> ^(FUNCTION scopedidentifier argumentlist);
variable:   scopedidentifier -> ^(VARIABLE scopedidentifier);

argumentlist:   (expression) ? (COMMA! expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ {$channel=HIDDEN;};

COMMENT :   '/*' .* '*/' {$channel=HIDDEN;};

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;};

STRING: (('\"') ( (~('\"')) )* ('\"'))+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';

这是我更改后的内容:

grammar equation;
options {

}   

tokens {
    VARIABLE;  
    CONSTANT;  
    EXPR;
    PAREXPR;
    EQUATION;
    UNARYEXPR;
    FUNCTION;
    BINARYOP;
    LIST;
}


equationset:    equation* EOF;
equation:   variable ASSIGN expression
    ;

parExpression 
    :   LPAREN expression RPAREN
    ;

expression
    :   conditionalexpression
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression
    |   MINUS unaryExpression
    | exponentexpression;

exponentexpression
    :   primary (CARET primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING | numeric;
booleantok  :   BOOLEAN;
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER;
function
    :   scopedidentifier LPAREN argumentlist RPAREN;
variable:   scopedidentifier;

argumentlist:   (expression) ? (COMMA expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);

COMMENT :   '/*' .* '*/' ->channel(HIDDEN);

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);

STRING: (('\"') ( (~('\"')) )* ('\"'))+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';

我尝试解析的示例方程式(之前运行正常)是:

[a].[b] = 1.76 * [Product_DC].[PDC_Inbound_Pallets] * if(product_dc.[PDC_DC] =="US84",1,0)

提前致谢。

  • 令牌应以逗号 , 而不是分号 ; 列出。另请参阅官方文档中的 Token Section 段落。
  • 由于 ANTLR 4.7 反斜杠不需要双引号转义。 STRING: (('\"') ( (~('\"')) )* ('\"'))+; 应该改写为 STRING: ('"' ~'"'* '"')+;.
  • 您在非贪婪匹配的多行注释标记中遗漏了问号:'/*' .* '*/' -> '/*' .*? '*/'

因此,固定语法如下所示:

grammar equation;

options {

}   

tokens {
    VARIABLE,
    CONSTANT,
    EXPR,
    PAREXPR,
    EQUATION,
    UNARYEXPR,
    FUNCTION,
    BINARYOP,
    LIST
}


equationset:    equation* EOF;
equation:   variable ASSIGN expression
    ;

parExpression 
    :   LPAREN expression RPAREN
    ;

expression
    :   conditionalexpression
    ;

conditionalexpression
    :   orExpression
    ;

orExpression
    :   andExpression ( OR andExpression )* 
    ;

andExpression
    :   comparisonExpression ( AND comparisonExpression )*;


comparisonExpression: 
    additiveExpression ((EQ | NE | LTE | GTE | LT | GT) additiveExpression)*;


additiveExpression
    :   multiplicativeExpression ( (PLUS | MINUS) multiplicativeExpression )*
    ;

multiplicativeExpression
    :   unaryExpression ( ( TIMES | DIVIDE) unaryExpression )*
    ;

unaryExpression
    :   NOT unaryExpression
    |   MINUS unaryExpression
    | exponentexpression;

exponentexpression
    :   primary (CARET primary)*;

primary :   parExpression | constant | booleantok | variable | function;

numeric:        INTEGER | REAL;
constant:       STRING | numeric;
booleantok  :   BOOLEAN;
scopedidentifier
    :   (IDENTIFIER DOT)* IDENTIFIER;
function
    :   scopedidentifier LPAREN argumentlist RPAREN;
variable:   scopedidentifier;

argumentlist:   (expression) ? (COMMA expression)*;  

WS  : (' '|'\r'|'\n'|'\t')+ ->channel(HIDDEN);

COMMENT :   '/*' .*? '*/' -> channel(HIDDEN);

LINE_COMMENT : '//' ~('\n'|'\r')* '\r'? '\n' ->channel(HIDDEN);

STRING: ('"' ~'"'* '"')+;

fragment ALPHA: 'a'..'z'|'_';
fragment DIGIT: '0'..'9';
fragment ALNUM: ALPHA|DIGIT;

EQ  :   '==';
ASSIGN  :   '=';
NE  :   '!=' | '<>';
OR  :   'or' | '||';
AND :   'and' | '&&';
NOT :   '!'|'not';
LTE :   '<=';
GTE :   '>=';
LT  :   '<';
GT  :   '>';
TIMES   :   '*';
DIVIDE  :   '/';

BOOLEAN :   'true' | 'false';

IDENTIFIER: ALPHA (ALNUM)* | ('[' (~(']'))+ ']') ;

REAL: DIGIT* DOT DIGIT+ ('e' (PLUS | MINUS)? DIGIT+)?;
INTEGER: DIGIT+;


PLUS    :   '+';
MINUS   :   '-';
COMMA   :   ',';
RPAREN  :   ')';
LPAREN  :   '(';
DOT :   '.';
CARET   :   '^';