我的语法将关键字标识为标识符
My grammar identifies keywords as identifiers
几乎每一个词都被识别为标识符,更复杂的规则甚至都达不到。例如,“程序”被识别为条件,它不将 'integer a,b;' 识别为 Decl_list ,仅将 'integer' 部分识别为 Decl.
你们知道为什么吗?
我正在使用此代码进行测试:
program test1
declare
integer a, b, c;
integer result;
begin
read (a);
read (c);
b := 10;
result := (a * c)/(b + 5) ;
write(result);
end
lexer grammar MiniLexer;
Program: 'program' Identifier Body;
Body: ('declare' Decl_list) 'begin' Stmt_list 'end';
Decl_list: Decl ';' (Decl ';')?;
Decl: Type Ident_list;
fragment
Ident_list: (Identifier ','?)*;
Type: 'integer' | 'decimal';
Stmt_list: Stmt ';' ((Stmt ';')*)?;
Stmt: Assign_stmt | If_stmt | While_stmt| Read_stmt | Write_stmt;
Assign_stmt: Identifier ':=' Simple_expr;
If_stmt: 'if' Condition 'then' Stmt_list 'end' | 'if' Condition 'then' Stmt_list 'else' Stmt_list 'end';
Condition: Expression;
For_stmt: 'for' Assign_stmt 'to' Condition 'do' Stmt_list 'end';
While_stmt: 'while' Condition 'do' Stmt_list 'end';
Read_stmt: 'read' '(' Identifier ')';
Write_stmt: 'write' '(' Writable ')';
Writable: Simple_expr | Literal;
Expression: Simple_expr | Simple_expr Relop Simple_expr;
Simple_expr: Term | Term Addop Term| '(' Term ')' ? Term ':' Term;
Term: Factor_a | Factor_a Mulop Factor_a;
Factor_a: Factor | 'not' Factor | '-' Factor;
Factor: Identifier | Constant | '(' Expression ')';
Relop: '=' | '>' | '>=' | '<' | '<=' | '<>';
Addop: '+' | '-' | 'or';
Mulop: '*' | '/' | 'mod' | 'and';
Shiftop: '<<' | '>>' | '<<<' | '>>>';
COMENTARIO: '%' ~('\n'|'\r')* '\r'? '\n' {skip();};
WS : ( ' '| '\t'| '\r'| '\n') {skip();};
Constant: ('0'..'9') (('0'..'9'))*;
Literal: '"' ('\u0000'..'\uFFFE')* '"';
Identifier: ('a'..'z'|'A'..'Z') (('a'..'z'|'A'..'Z') | ('0'..'9'))*;
你们知道为什么吗?
您的语法是词法分析器语法,这意味着它只生成标记。在此处了解 lexer
、parser
和组合语法之间的区别:https://github.com/antlr/antlr4/blob/master/doc/grammars.md
简而言之,从语法中删除单词 lexer
并将一些规则更改为解析器规则(这些规则以小写字母开头):
grammar Mini;
program: 'program' Identifier body EOF;
body: ('declare' decl_list) 'begin' stmt_list 'end';
decl_list: decl ';' (decl ';')?;
decl: type ident_list;
ident_list: (Identifier ','?)*;
type: 'integer' | 'decimal';
stmt_list: stmt ';' (stmt ';')*;
stmt: assign_stmt | if_stmt | while_stmt| read_stmt | write_stmt | for_stmt;
assign_stmt: Identifier ':=' simple_expr;
if_stmt: 'if' condition 'then' stmt_list 'end' | 'if' condition 'then' stmt_list 'else' stmt_list 'end';
condition: expression;
for_stmt: 'for' assign_stmt 'to' condition 'do' stmt_list 'end';
while_stmt: 'while' condition 'do' stmt_list 'end';
read_stmt: 'read' '(' Identifier ')';
write_stmt: 'write' '(' writable ')';
writable: simple_expr | Literal;
expression: simple_expr | simple_expr Relop simple_expr;
simple_expr: term | term Addop term| '(' term ')' ? term ':' term;
term: factor_a | factor_a Mulop factor_a;
factor_a: factor | 'not' factor | '-' factor;
factor: Identifier | Constant | '(' expression ')';
Relop: '=' | '>' | '>=' | '<' | '<=' | '<>';
Addop: '+' | '-' | 'or';
Mulop: '*' | '/' | 'mod' | 'and';
Shiftop: '<<' | '>>' | '<<<' | '>>>';
COMENTARIO: '%' ~('\n'|'\r')* '\r'? '\n' -> skip;
Constant: ('0'..'9') (('0'..'9'))*;
Literal: '"' ('\u0000'..'\uFFFE')* '"';
Identifier: ('a'..'z'|'A'..'Z') (('a'..'z'|'A'..'Z') | ('0'..'9'))*;
Space: [ \t\r\n] -> skip;
请注意 {skip();}
是旧的 v3 语法,请改用 -> skip
。
和 Constant: ('0'..'9') (('0'..'9'))*;
也是旧的 v3 语法(尽管在 v4 中仍然有效)。首选的方式是这样的:
Constant: [0-9] (([0-9]))*;
可以简单地写成:
Constant: [0-9]+;
几乎每一个词都被识别为标识符,更复杂的规则甚至都达不到。例如,“程序”被识别为条件,它不将 'integer a,b;' 识别为 Decl_list ,仅将 'integer' 部分识别为 Decl.
你们知道为什么吗?
我正在使用此代码进行测试:
program test1
declare
integer a, b, c;
integer result;
begin
read (a);
read (c);
b := 10;
result := (a * c)/(b + 5) ;
write(result);
end
lexer grammar MiniLexer;
Program: 'program' Identifier Body;
Body: ('declare' Decl_list) 'begin' Stmt_list 'end';
Decl_list: Decl ';' (Decl ';')?;
Decl: Type Ident_list;
fragment
Ident_list: (Identifier ','?)*;
Type: 'integer' | 'decimal';
Stmt_list: Stmt ';' ((Stmt ';')*)?;
Stmt: Assign_stmt | If_stmt | While_stmt| Read_stmt | Write_stmt;
Assign_stmt: Identifier ':=' Simple_expr;
If_stmt: 'if' Condition 'then' Stmt_list 'end' | 'if' Condition 'then' Stmt_list 'else' Stmt_list 'end';
Condition: Expression;
For_stmt: 'for' Assign_stmt 'to' Condition 'do' Stmt_list 'end';
While_stmt: 'while' Condition 'do' Stmt_list 'end';
Read_stmt: 'read' '(' Identifier ')';
Write_stmt: 'write' '(' Writable ')';
Writable: Simple_expr | Literal;
Expression: Simple_expr | Simple_expr Relop Simple_expr;
Simple_expr: Term | Term Addop Term| '(' Term ')' ? Term ':' Term;
Term: Factor_a | Factor_a Mulop Factor_a;
Factor_a: Factor | 'not' Factor | '-' Factor;
Factor: Identifier | Constant | '(' Expression ')';
Relop: '=' | '>' | '>=' | '<' | '<=' | '<>';
Addop: '+' | '-' | 'or';
Mulop: '*' | '/' | 'mod' | 'and';
Shiftop: '<<' | '>>' | '<<<' | '>>>';
COMENTARIO: '%' ~('\n'|'\r')* '\r'? '\n' {skip();};
WS : ( ' '| '\t'| '\r'| '\n') {skip();};
Constant: ('0'..'9') (('0'..'9'))*;
Literal: '"' ('\u0000'..'\uFFFE')* '"';
Identifier: ('a'..'z'|'A'..'Z') (('a'..'z'|'A'..'Z') | ('0'..'9'))*;
你们知道为什么吗?
您的语法是词法分析器语法,这意味着它只生成标记。在此处了解 lexer
、parser
和组合语法之间的区别:https://github.com/antlr/antlr4/blob/master/doc/grammars.md
简而言之,从语法中删除单词 lexer
并将一些规则更改为解析器规则(这些规则以小写字母开头):
grammar Mini;
program: 'program' Identifier body EOF;
body: ('declare' decl_list) 'begin' stmt_list 'end';
decl_list: decl ';' (decl ';')?;
decl: type ident_list;
ident_list: (Identifier ','?)*;
type: 'integer' | 'decimal';
stmt_list: stmt ';' (stmt ';')*;
stmt: assign_stmt | if_stmt | while_stmt| read_stmt | write_stmt | for_stmt;
assign_stmt: Identifier ':=' simple_expr;
if_stmt: 'if' condition 'then' stmt_list 'end' | 'if' condition 'then' stmt_list 'else' stmt_list 'end';
condition: expression;
for_stmt: 'for' assign_stmt 'to' condition 'do' stmt_list 'end';
while_stmt: 'while' condition 'do' stmt_list 'end';
read_stmt: 'read' '(' Identifier ')';
write_stmt: 'write' '(' writable ')';
writable: simple_expr | Literal;
expression: simple_expr | simple_expr Relop simple_expr;
simple_expr: term | term Addop term| '(' term ')' ? term ':' term;
term: factor_a | factor_a Mulop factor_a;
factor_a: factor | 'not' factor | '-' factor;
factor: Identifier | Constant | '(' expression ')';
Relop: '=' | '>' | '>=' | '<' | '<=' | '<>';
Addop: '+' | '-' | 'or';
Mulop: '*' | '/' | 'mod' | 'and';
Shiftop: '<<' | '>>' | '<<<' | '>>>';
COMENTARIO: '%' ~('\n'|'\r')* '\r'? '\n' -> skip;
Constant: ('0'..'9') (('0'..'9'))*;
Literal: '"' ('\u0000'..'\uFFFE')* '"';
Identifier: ('a'..'z'|'A'..'Z') (('a'..'z'|'A'..'Z') | ('0'..'9'))*;
Space: [ \t\r\n] -> skip;
请注意 {skip();}
是旧的 v3 语法,请改用 -> skip
。
和 Constant: ('0'..'9') (('0'..'9'))*;
也是旧的 v3 语法(尽管在 v4 中仍然有效)。首选的方式是这样的:
Constant: [0-9] (([0-9]))*;
可以简单地写成:
Constant: [0-9]+;