如何在 ANTLR 中启用 orExpression 和 andExpression 规则的组合？

Question

我想用antlr4解析以下内容

isSet(foo) or isSet(bar) and isSet(test)

实际上我可以在解析树中看到只有第一个 or 被识别，我可以添加多个 or 并且解析树增长，但是额外的 and 将不会被识别。我如何在语法中定义它？

这是我当前的语法文件：

        grammar Expr;
        
        prog: (stat)+;
        stat: (command | orExpression | andExpression | notExpression)+;
        orExpression: command ( OR command | XOR command)*;
        andExpression:command ( AND command)*;
        notExpression:NOT command;
        command:IS_SET LPAREN parameter RPAREN
                | IS_EMPTY LPAREN parameter RPAREN;
        parameter: ID;
        
        
        LPAREN : '(';
        RPAREN : ')';
        LBRACE : '{';
        RBRACE : '}';
        LBRACK : '[';
        RBRACK : ']';
        SEMI : ';';
        COMMA : ',';
        DOT : '.';
        ASSIGN : '=';
        GT : '>';
        LT : '<';
        BANG : '!';
        TILDE : '~';
        QUESTION : '?';
        COLON : ':';
        EQUAL : '==';
        LE : '<=';
        GE : '>=';
        NOTEQUAL : '!=';
        AND : 'and';
        OR : 'or';
        XOR :'xor';
        NOT :'not'  ;
        INC : '++';
        DEC : '--';
        ADD : '+';
        SUB : '-';
        MUL : '*';
        DIV : '/';
        
        INT: [0-9]+;
        NEWLINE: '\r'? '\n';
        IS_SET:'isSet';
        IS_EMPTY:'isEmpty';
        WS: [\t]+ -> skip;
        ID
            :   JavaLetter JavaLetterOrDigit*
            ;
        
        fragment
        JavaLetter
            :   [a-zA-Z$_] // these are the "java letters" below 0xFF
            |   // covers all characters above 0xFF which are not a surrogate
                ~[\u0000-\u00FF\uD800-\uDBFF]
                {Character.isJavaIdentifierStart(_input.LA(-1))}?
            |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
                [\uD800-\uDBFF] [\uDC00-\uDFFF]
                {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
            ;
        
        fragment
        JavaLetterOrDigit
            :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
            |   // covers all characters above 0xFF which are not a surrogate
                ~[\u0000-\u00FF\uD800-\uDBFF]
                {Character.isJavaIdentifierPart(_input.LA(-1))}?
            |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
                [\uD800-\uDBFF] [\uDC00-\uDFFF]
                {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
            ;

在这里你可以看到解析树，缺少andExpression

Answer 1

只有第一部分被解析，因为规则 prog: (stat)+; 只被告知要解析至少 1 个 stat，它确实这样做了。如果您希望解析器处理所有标记，请使用 EOF 标记“锚定”您的开始规则：

prog : stat+ EOF;

现在您的输入 isSet(foo) or isSet(bar) and isSet(test) 将产生一条错误消息。第一部分 isSet(foo) or isSet(bar) 仍被识别为 orExpression，但无法匹配最后一部分 and isSet(test)。一般的想法是做这样的事情：

prog          : stat+ EOF;
stat          : orExpression+;
orExpression  : andExpression ( OR andExpression | XOR andExpression)*;
andExpression : notExpression ( AND notExpression)*;
notExpression : NOT? command;
command       : IS_SET LPAREN parameter RPAREN
              | IS_EMPTY LPAREN parameter RPAREN;
parameter     : ID;

但是ANTLR4支持直接左递归规则，所以上面的规则你也可以这样写：

prog: expr+ EOF;

expr
 : NOT expr                  #NotExpr
 | expr AND expr             #AndExpr
 | expr (OR | XOR) expr      #OrExpr
 | IS_SET LPAREN expr RPAREN #CommandExpr
 | ID                        #IdExpr
 ;

在我看来，这要好得多。

如何在 ANTLR 中启用 orExpression 和 andExpression 规则的组合？

How to enable combination of orExpression and andExpression rule in ANTLR?

antlr

antlr4