ANTLR4:三元表达式的优先级
ANTLR4: Priority on ternary expression
在 Jakarta 表达式语言规范中有以下条件:
Qualified functions with a namespace prefix have precedence over the
operators. Thus the expression ${c?b:f()}
is illegal because b:f()
is
being parsed as a qualified function instead of part of a conditional
expression. As usual, () can be used to make the precedence explicit,
e.g ${c?b:(f())}
.
grammar ExpressionLanguageGrammar;
prog: compositeExpression;
compositeExpression: (dynamicExpression | deferredExpression | literalExpression)*;
dynamicExpression: DYNAMIC_START expression RCURL;
deferredExpression: DEFERRED_START expression RCURL;
literalExpression: literal;
literal: booleanLiteralExpression | floatingPointLiteralExpression | integerLiteralExpression | stringLiteralExpression | nullLiteralExpression;
booleanLiteralExpression: BOOL_LITERAL;
floatingPointLiteralExpression: FLOATING_POINT_LITERAL;
integerLiteralExpression: INTEGER_LITERAL;
stringLiteralExpression: StringLiteral;
nullLiteralExpression: NULL;
arguments: LPAREN expressionList? RPAREN;
expressionList: (expression ((COMMA expression)*));
lambdaParameters: IDENTIFIER | (LPAREN (IDENTIFIER ((COMMA IDENTIFIER)*))? RPAREN);
mapEntry: expression COLON expression;
mapEntries: mapEntry (COMMA mapEntry)*;
expression
: expression (LBRACK expression RBRACK) #memberIndexExpression
| expression bop=DOT (IDENTIFIER) #memberDotExpression
| expression arguments #callExpression
| prefix=(MINUS | NOT | EMPTY) expression #unaryExpression
| expression bop=(MULT | DIV | MOD ) expression #infixExpression
| expression bop=(PLUS | MINUS) expression #infixExpression
| expression bop=(LE | GE | LT | GT) expression #relationalExpression
| expression bop=INSTANCEOF IDENTIFIER #infixExpression
| expression bop=(EQ | NE) expression #relationalExpression
| expression bop=AND expression #logicalExpression
| expression bop=OR expression #logicalExpression
| IDENTIFIER (COLON IDENTIFIER)? arguments #namespaceFunctionExpression
| <assoc=right> expression bop=QUESTIONMARK expression bop=COLON expression #ternaryExpression
| <assoc=right> expression bop=(ASSIGN | CONCAT) expression #assignExpression
| lambdaParameters ARROW expression #lambdaExpression
| expression SEMICOLON expression #semicolonExpression
| IDENTIFIER #identifierExpression
| literal #literalExpr
| LBRACK expressionList? RBRACK #listExpression
| LCURL expressionList? RCURL #setExpression
| LCURL mapEntries? RCURL #mapExpression
| LPAREN expression RPAREN #parenExpression
;
// LEXER
LCURL: '{';
RCURL: '}';
BOOL_LITERAL: TRUE | FALSE;
TRUE: 'true';
FALSE: 'false';
NULL: 'null';
DOT: '.';
LPAREN: '(';
RPAREN: ')';
LBRACK: '[';
RBRACK: ']';
COLON: ':';
COMMA: ',';
SEMICOLON: ';';
GT: ('>' | 'gt');
LT: ('<' | 'lt');
GE: ('>=' | 'ge');
LE: ('<=' | 'le');
EQ: ('==' | 'eq');
NE: ('!=' | 'ne');
NOT: ('!' | 'not');
AND: ('&&' | 'and');
OR: ('||' | 'or');
EMPTY: 'empty';
INSTANCEOF: 'instanceof';
MULT: '*';
PLUS: '+';
MINUS: '-';
QUESTIONMARK: '?';
DIV: ('/' | 'div');
MOD: ('%' | 'mod');
CONCAT: '+=';
ASSIGN: '=';
ARROW: '->';
DYNAMIC_START: DOLLAR LCURL;
DEFERRED_START: HASH LCURL;
DOLLAR: '$';
HASH: '#';
INTEGER_LITERAL: [0-9]+;
FLOATING_POINT_LITERAL: [0-9]+ '.' [0-9]* EXPONENT? | '.' [0-9]+ EXPONENT? | [0-9]+ EXPONENT?;
fragment EXPONENT: ('e'|'E') ('+'|'-')? [0-9]+;
StringLiteral: ('"' DoubleStringCharacter* '"'
| '\'' SingleStringCharacter* '\'') ;
fragment DoubleStringCharacter
: ~["\\r\n]
| '\' EscapeSequence
;
fragment SingleStringCharacter
: ~['\\r\n]
| '\' EscapeSequence
;
fragment EscapeSequence
: CharacterEscapeSequence
| '0'
| HexEscapeSequence
| UnicodeEscapeSequence
| ExtendedUnicodeEscapeSequence
;
fragment CharacterEscapeSequence
: SingleEscapeCharacter
| NonEscapeCharacter
;
fragment HexEscapeSequence
: 'x' HexDigit HexDigit
;
fragment UnicodeEscapeSequence
: 'u' HexDigit HexDigit HexDigit HexDigit
| 'u' '{' HexDigit HexDigit+ '}'
;
fragment ExtendedUnicodeEscapeSequence
: 'u' '{' HexDigit+ '}'
;
fragment SingleEscapeCharacter
: ['"\bfnrtv]
;
fragment NonEscapeCharacter
: ~['"\bfnrtv0-9xu\r\n]
;
fragment EscapeCharacter
: SingleEscapeCharacter
| [0-9]
| [xu]
;
fragment HexDigit
: [_0-9a-fA-F]
;
fragment DecimalIntegerLiteral
: '0'
| [1-9] [0-9_]*
;
fragment ExponentPart
: [eE] [+-]? [0-9_]+
;
fragment IdentifierPart
: IdentifierStart
| [\p{Mn}]
| [\p{Nd}]
| [\p{Pc}]
| '\u200C'
| '\u200D'
;
fragment IdentifierStart
: [\p{L}]
| [$_]
| '\' UnicodeEscapeSequence
;
IDENTIFIER: LETTER (LETTER|DIGIT)*;
LETTER: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff';
DIGIT: '\u0030'..'\u0039'|
'\u0660'..'\u0669'|
'\u06f0'..'\u06f9'|
'\u0966'..'\u096f'|
'\u09e6'..'\u09ef'|
'\u0a66'..'\u0a6f'|
'\u0ae6'..'\u0aef'|
'\u0b66'..'\u0b6f'|
'\u0be7'..'\u0bef'|
'\u0c66'..'\u0c6f'|
'\u0ce6'..'\u0cef'|
'\u0d66'..'\u0d6f'|
'\u0e50'..'\u0e59'|
'\u0ed0'..'\u0ed9'|
'\u1040'..'\u1049';
WS: [ \t\r\n]+ -> skip;
ANY: .;
我怎样才能把它变成语法错误?目前,这已被解析并且没有 return 任何错误。错误应该在代码端处理还是可以在解析器端处理?
尝试以下操作:
我添加了解析器规则:
qualifiedFunction: IDENTIFIER COLON IDENTIFIER arguments;
然后我把它作为expression
规则的第一个备选方案:
expression
: qualifiedFunction # QFunc
然后我修改了 ternaryExpression
的备选方案以创建两个备选方案(顺序很重要):
| expression QUESTIONMARK qualifiedFunction # badTernaryExpression
| expression QUESTIONMARK (trueExpr=expression COLON falseExpr=expression) # ternaryExpression
您可以在 ANTLR 中使用的一个有用的“技巧”是编写一个规则来识别特定的无效构造,并让 ANTLR 为您构建一棵树,以便于您识别。
这是我让 ANTLR 识别这一点并创建可用于识别无效用法的解析树的唯一方法(我也不太明白为什么 ternaryExpression
,但如果我删除它们,它会将您的示例识别为正则三元表达式。
现在我可以创建一个监听器了:
import org.antlr.v4.runtime.ANTLRErrorStrategy;
public class BadTernaryListener extends ExpressionLanguageGrammarBaseListener {
@Override
public void enterBadTernaryExpression(ExpressionLanguageGrammarParser.BadTernaryExpressionContext ctx) {
// Add your error to your error list here
System.out.println("You can't use a qualified Function here");
}
}
在实践中,您会有一些错误处理程序从解析中收集错误,您可以将其传递给侦听器,这样当您遇到这种用法时,您可以使用相同的错误侦听器来添加您想要的任何消息。
我不完全确定这会涵盖 ALL 的限定函数优先级需求,但它确实检测到这种情况。
在 Jakarta 表达式语言规范中有以下条件:
Qualified functions with a namespace prefix have precedence over the operators. Thus the expression
${c?b:f()}
is illegal becauseb:f()
is being parsed as a qualified function instead of part of a conditional expression. As usual, () can be used to make the precedence explicit, e.g${c?b:(f())}
.
grammar ExpressionLanguageGrammar;
prog: compositeExpression;
compositeExpression: (dynamicExpression | deferredExpression | literalExpression)*;
dynamicExpression: DYNAMIC_START expression RCURL;
deferredExpression: DEFERRED_START expression RCURL;
literalExpression: literal;
literal: booleanLiteralExpression | floatingPointLiteralExpression | integerLiteralExpression | stringLiteralExpression | nullLiteralExpression;
booleanLiteralExpression: BOOL_LITERAL;
floatingPointLiteralExpression: FLOATING_POINT_LITERAL;
integerLiteralExpression: INTEGER_LITERAL;
stringLiteralExpression: StringLiteral;
nullLiteralExpression: NULL;
arguments: LPAREN expressionList? RPAREN;
expressionList: (expression ((COMMA expression)*));
lambdaParameters: IDENTIFIER | (LPAREN (IDENTIFIER ((COMMA IDENTIFIER)*))? RPAREN);
mapEntry: expression COLON expression;
mapEntries: mapEntry (COMMA mapEntry)*;
expression
: expression (LBRACK expression RBRACK) #memberIndexExpression
| expression bop=DOT (IDENTIFIER) #memberDotExpression
| expression arguments #callExpression
| prefix=(MINUS | NOT | EMPTY) expression #unaryExpression
| expression bop=(MULT | DIV | MOD ) expression #infixExpression
| expression bop=(PLUS | MINUS) expression #infixExpression
| expression bop=(LE | GE | LT | GT) expression #relationalExpression
| expression bop=INSTANCEOF IDENTIFIER #infixExpression
| expression bop=(EQ | NE) expression #relationalExpression
| expression bop=AND expression #logicalExpression
| expression bop=OR expression #logicalExpression
| IDENTIFIER (COLON IDENTIFIER)? arguments #namespaceFunctionExpression
| <assoc=right> expression bop=QUESTIONMARK expression bop=COLON expression #ternaryExpression
| <assoc=right> expression bop=(ASSIGN | CONCAT) expression #assignExpression
| lambdaParameters ARROW expression #lambdaExpression
| expression SEMICOLON expression #semicolonExpression
| IDENTIFIER #identifierExpression
| literal #literalExpr
| LBRACK expressionList? RBRACK #listExpression
| LCURL expressionList? RCURL #setExpression
| LCURL mapEntries? RCURL #mapExpression
| LPAREN expression RPAREN #parenExpression
;
// LEXER
LCURL: '{';
RCURL: '}';
BOOL_LITERAL: TRUE | FALSE;
TRUE: 'true';
FALSE: 'false';
NULL: 'null';
DOT: '.';
LPAREN: '(';
RPAREN: ')';
LBRACK: '[';
RBRACK: ']';
COLON: ':';
COMMA: ',';
SEMICOLON: ';';
GT: ('>' | 'gt');
LT: ('<' | 'lt');
GE: ('>=' | 'ge');
LE: ('<=' | 'le');
EQ: ('==' | 'eq');
NE: ('!=' | 'ne');
NOT: ('!' | 'not');
AND: ('&&' | 'and');
OR: ('||' | 'or');
EMPTY: 'empty';
INSTANCEOF: 'instanceof';
MULT: '*';
PLUS: '+';
MINUS: '-';
QUESTIONMARK: '?';
DIV: ('/' | 'div');
MOD: ('%' | 'mod');
CONCAT: '+=';
ASSIGN: '=';
ARROW: '->';
DYNAMIC_START: DOLLAR LCURL;
DEFERRED_START: HASH LCURL;
DOLLAR: '$';
HASH: '#';
INTEGER_LITERAL: [0-9]+;
FLOATING_POINT_LITERAL: [0-9]+ '.' [0-9]* EXPONENT? | '.' [0-9]+ EXPONENT? | [0-9]+ EXPONENT?;
fragment EXPONENT: ('e'|'E') ('+'|'-')? [0-9]+;
StringLiteral: ('"' DoubleStringCharacter* '"'
| '\'' SingleStringCharacter* '\'') ;
fragment DoubleStringCharacter
: ~["\\r\n]
| '\' EscapeSequence
;
fragment SingleStringCharacter
: ~['\\r\n]
| '\' EscapeSequence
;
fragment EscapeSequence
: CharacterEscapeSequence
| '0'
| HexEscapeSequence
| UnicodeEscapeSequence
| ExtendedUnicodeEscapeSequence
;
fragment CharacterEscapeSequence
: SingleEscapeCharacter
| NonEscapeCharacter
;
fragment HexEscapeSequence
: 'x' HexDigit HexDigit
;
fragment UnicodeEscapeSequence
: 'u' HexDigit HexDigit HexDigit HexDigit
| 'u' '{' HexDigit HexDigit+ '}'
;
fragment ExtendedUnicodeEscapeSequence
: 'u' '{' HexDigit+ '}'
;
fragment SingleEscapeCharacter
: ['"\bfnrtv]
;
fragment NonEscapeCharacter
: ~['"\bfnrtv0-9xu\r\n]
;
fragment EscapeCharacter
: SingleEscapeCharacter
| [0-9]
| [xu]
;
fragment HexDigit
: [_0-9a-fA-F]
;
fragment DecimalIntegerLiteral
: '0'
| [1-9] [0-9_]*
;
fragment ExponentPart
: [eE] [+-]? [0-9_]+
;
fragment IdentifierPart
: IdentifierStart
| [\p{Mn}]
| [\p{Nd}]
| [\p{Pc}]
| '\u200C'
| '\u200D'
;
fragment IdentifierStart
: [\p{L}]
| [$_]
| '\' UnicodeEscapeSequence
;
IDENTIFIER: LETTER (LETTER|DIGIT)*;
LETTER: '\u0024' |
'\u0041'..'\u005a' |
'\u005f' |
'\u0061'..'\u007a' |
'\u00c0'..'\u00d6' |
'\u00d8'..'\u00f6' |
'\u00f8'..'\u00ff' |
'\u0100'..'\u1fff' |
'\u3040'..'\u318f' |
'\u3300'..'\u337f' |
'\u3400'..'\u3d2d' |
'\u4e00'..'\u9fff' |
'\uf900'..'\ufaff';
DIGIT: '\u0030'..'\u0039'|
'\u0660'..'\u0669'|
'\u06f0'..'\u06f9'|
'\u0966'..'\u096f'|
'\u09e6'..'\u09ef'|
'\u0a66'..'\u0a6f'|
'\u0ae6'..'\u0aef'|
'\u0b66'..'\u0b6f'|
'\u0be7'..'\u0bef'|
'\u0c66'..'\u0c6f'|
'\u0ce6'..'\u0cef'|
'\u0d66'..'\u0d6f'|
'\u0e50'..'\u0e59'|
'\u0ed0'..'\u0ed9'|
'\u1040'..'\u1049';
WS: [ \t\r\n]+ -> skip;
ANY: .;
我怎样才能把它变成语法错误?目前,这已被解析并且没有 return 任何错误。错误应该在代码端处理还是可以在解析器端处理?
尝试以下操作:
我添加了解析器规则:
qualifiedFunction: IDENTIFIER COLON IDENTIFIER arguments;
然后我把它作为expression
规则的第一个备选方案:
expression
: qualifiedFunction # QFunc
然后我修改了 ternaryExpression
的备选方案以创建两个备选方案(顺序很重要):
| expression QUESTIONMARK qualifiedFunction # badTernaryExpression
| expression QUESTIONMARK (trueExpr=expression COLON falseExpr=expression) # ternaryExpression
您可以在 ANTLR 中使用的一个有用的“技巧”是编写一个规则来识别特定的无效构造,并让 ANTLR 为您构建一棵树,以便于您识别。
这是我让 ANTLR 识别这一点并创建可用于识别无效用法的解析树的唯一方法(我也不太明白为什么 ternaryExpression
,但如果我删除它们,它会将您的示例识别为正则三元表达式。
现在我可以创建一个监听器了:
import org.antlr.v4.runtime.ANTLRErrorStrategy;
public class BadTernaryListener extends ExpressionLanguageGrammarBaseListener {
@Override
public void enterBadTernaryExpression(ExpressionLanguageGrammarParser.BadTernaryExpressionContext ctx) {
// Add your error to your error list here
System.out.println("You can't use a qualified Function here");
}
}
在实践中,您会有一些错误处理程序从解析中收集错误,您可以将其传递给侦听器,这样当您遇到这种用法时,您可以使用相同的错误侦听器来添加您想要的任何消息。
我不完全确定这会涵盖 ALL 的限定函数优先级需求,但它确实检测到这种情况。