解析以特定关键字开头和结尾的代码块

Question

关注 this question, I want to parse blocks of code starting with specific keywords (e.g., <firstKeyword>, <secondKeyword>, <thirdKeyword>, ...) ending with the keyword End. In between, some statements should end with either semicolon or a new line. What I have done so far can be seen in this repository，但很快：

grammar <garammarName>

// Parser Rules

statement: EndOfStatment;

statement_list: statement+;

section:
    '<firstKeyword>' statement_list End
    | '<secondKeyword>' statement_list End
    | '<thirdKeyword>' statement_list End

sections: section+ EOF;

// Lexer Rules

End: 'End';

NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;

WhiteSpace: [ \t\r\n]+ -> skip;

EndOfStatment: ';' | NewLine;

但是，问题是当代码块未以 End 关键字结束时，TestRig / grun 工具 () 不会抛出错误。例如示例代码<exampleFile>:

<firstKeyword>
End

<secondKeyword>

<thirdKeyword>
End

return

没有任何错误

grun <garammarName> sections -tree < <exampleFile>

如果您能帮助我了解问题以及解决方法，我将不胜感激。

Answer 1

当我运行输入类似于您在此处给出的内容时，我得到：

➜ grun ElmerSolver sections -tree  < examples/ex001.sif
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)

第 6 行缺少 'End' 时特别出错。(line 6:0 missing 'End' at 'Equation')

ANTLR 错误恢复确实提供了缺失的 'End' 来恢复并继续解析，但它指出了错误。

作为参考，这是我使用的完整语法：

grammar ElmerSolver;

// Parser Rules

// eostmt: ';' | CR;

statement: EndOfStatment;

statement_list: statement*;

sections: section+ EOF;
// section: SectionName /* statement_list */ End;

// Lexer Rules

fragment DIGIT: [0-9];
Integer: DIGIT+;

Float:
    [+-]? (DIGIT+ ([.]DIGIT*)? | [.]DIGIT+) ([Ee][+-]? DIGIT+)?;

section:
    'Header' statement_list End                         # headerSection
    | 'Simulation' statement_list End                   # simulatorSection
    | 'Constants' statement_list End                    # constantsSection
    | 'Body' Integer statement_list End                 # bodySection
    | 'Material' Integer statement_list End             # materialSection
    | 'Body Force' Integer statement_list End           # bodyForceSection
    | 'Equation' Integer statement_list End             # equationSection
    | 'Solver' Integer statement_list End               # solverSection
    | 'Boundary Condition' Integer statement_list End   # boundaryConditionSection
    | 'Initial Condition' Integer statement_list End    # initialConditionSection
    | 'Component' Integer statement_list End            # componentSection;

End: 'End';

// statementEnd: ';' NewLine*;

NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;

LineJoining:
    '\' WhiteSpace? ('\r'? '\n' | '\r' | '\f') -> skip;

WhiteSpace: [ \t\r\n]+ -> skip;

LineComment: '#' ~( '\r' | '\n')* -> skip;

EndOfStatment: ';' | NewLine;

((我对 EndOfStatement Lexer 规则进行了更改)

这是我使用的输入文件：

Simulation
End

Constants 

Equation 1
End

这是我使用 -gui g运行选项得到的图形视图；

回复：您对 EndOfStatment 规则的更改。

EndOfStatment 应该是解析器规则（小写）。

此外，按照您的语法，'\n' 将始终被识别为 NewLine 标记，并使用 -> skip 规则将其排除在 tokenStream 之外。

运行 grun 带有 -tokens 选项，您将看不到 EndOfStatement 标记。（除非你在你的源文件中放了一个';'。）

➜ grun ElmerSolver sections -tree -tokens < examples/ex001.sif
[@0,0:9='Simulation',<'Simulation'>,1:0]
[@1,11:13='End',<'End'>,2:0]
[@2,16:24='Constants',<'Constants'>,4:0]
[@3,28:35='Equation',<'Equation'>,6:0]
[@4,37:37='1',<Integer>,6:9]
[@5,39:41='End',<'End'>,7:0]
[@6,42:41='<EOF>',<EOF>,7:3]
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)

如果您希望换行符在语法上有意义（即您可以在语法中使用它），则需要删除 -> skip.

但是，一旦你这样做了，你就必须具体说明换行符有效的所有地方（但我看到了你的 LineJoining 标记，所以它看起来应该有一点 Python 感觉，所以这可能就是你想要的）。（同样的评论回复：-> skip 适用于此）。如果您走“Python-like”路线，请了解 Pythongs EOL 和缩进处理是众所周知的解析器添加（并且“The Definitive ANTLR 4 Reference”有一个部分专门介绍必须完成处理）。您还可以在 ANTLR Python grammar

参考 Python 语法

解析以特定关键字开头和结尾的代码块

parse code blocks starting and ending with specific keywords

antlr

antlr4