解析以特定关键字开头和结尾的代码块
parse code blocks starting and ending with specific keywords
关注 this question, I want to parse blocks of code starting with specific keywords (e.g., <firstKeyword>
, <secondKeyword>
, <thirdKeyword>
, ...) ending with the keyword End
. In between, some statements should end with either semicolon or a new line. What I have done so far can be seen in this repository,但很快:
grammar <garammarName>
// Parser Rules
statement: EndOfStatment;
statement_list: statement+;
section:
'<firstKeyword>' statement_list End
| '<secondKeyword>' statement_list End
| '<thirdKeyword>' statement_list End
sections: section+ EOF;
// Lexer Rules
End: 'End';
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
EndOfStatment: ';' | NewLine;
但是,问题是当代码块未以 End
关键字结束时,TestRig / grun 工具 () 不会抛出错误。例如示例代码<exampleFile>
:
<firstKeyword>
End
<secondKeyword>
<thirdKeyword>
End
return
没有任何错误
grun <garammarName> sections -tree < <exampleFile>
如果您能帮助我了解问题以及解决方法,我将不胜感激。
当我 运行 输入类似于您在此处给出的内容时,我得到:
➜ grun ElmerSolver sections -tree < examples/ex001.sif
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)
第 6 行缺少 'End' 时特别出错。(line 6:0 missing 'End' at 'Equation'
)
ANTLR 错误恢复确实提供了缺失的 'End' 来恢复并继续解析,但它指出了错误。
作为参考,这是我使用的完整语法:
grammar ElmerSolver;
// Parser Rules
// eostmt: ';' | CR;
statement: EndOfStatment;
statement_list: statement*;
sections: section+ EOF;
// section: SectionName /* statement_list */ End;
// Lexer Rules
fragment DIGIT: [0-9];
Integer: DIGIT+;
Float:
[+-]? (DIGIT+ ([.]DIGIT*)? | [.]DIGIT+) ([Ee][+-]? DIGIT+)?;
section:
'Header' statement_list End # headerSection
| 'Simulation' statement_list End # simulatorSection
| 'Constants' statement_list End # constantsSection
| 'Body' Integer statement_list End # bodySection
| 'Material' Integer statement_list End # materialSection
| 'Body Force' Integer statement_list End # bodyForceSection
| 'Equation' Integer statement_list End # equationSection
| 'Solver' Integer statement_list End # solverSection
| 'Boundary Condition' Integer statement_list End # boundaryConditionSection
| 'Initial Condition' Integer statement_list End # initialConditionSection
| 'Component' Integer statement_list End # componentSection;
End: 'End';
// statementEnd: ';' NewLine*;
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
LineJoining:
'\' WhiteSpace? ('\r'? '\n' | '\r' | '\f') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
LineComment: '#' ~( '\r' | '\n')* -> skip;
EndOfStatment: ';' | NewLine;
((我对 EndOfStatement
Lexer 规则进行了更改)
这是我使用的输入文件:
Simulation
End
Constants
Equation 1
End
这是我使用 -gui
g运行 选项得到的图形视图;
回复:您对 EndOfStatment
规则的更改。
EndOfStatment 应该是解析器规则(小写)。
此外,按照您的语法,'\n' 将始终被识别为 NewLine
标记,并使用 -> skip
规则将其排除在 tokenStream 之外。
运行 grun
带有 -tokens
选项,您将看不到 EndOfStatement
标记。 (除非你在你的源文件中放了一个';'。)
➜ grun ElmerSolver sections -tree -tokens < examples/ex001.sif
[@0,0:9='Simulation',<'Simulation'>,1:0]
[@1,11:13='End',<'End'>,2:0]
[@2,16:24='Constants',<'Constants'>,4:0]
[@3,28:35='Equation',<'Equation'>,6:0]
[@4,37:37='1',<Integer>,6:9]
[@5,39:41='End',<'End'>,7:0]
[@6,42:41='<EOF>',<EOF>,7:3]
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)
如果您希望换行符在语法上有意义(即您可以在语法中使用它),则需要删除 -> skip
.
但是,一旦你这样做了,你就必须具体说明换行符有效的所有地方(但我看到了你的 LineJoining 标记,所以它看起来应该有一点 Python 感觉,所以这可能就是你想要的)。 (同样的评论回复:-> skip
适用于此)。如果您走“Python-like”路线,请了解 Pythongs EOL 和缩进处理是众所周知的解析器添加(并且“The Definitive ANTLR 4 Reference”有一个部分专门介绍必须完成处理)。您还可以在 ANTLR Python grammar
参考 Python 语法
关注 this question, I want to parse blocks of code starting with specific keywords (e.g., <firstKeyword>
, <secondKeyword>
, <thirdKeyword>
, ...) ending with the keyword End
. In between, some statements should end with either semicolon or a new line. What I have done so far can be seen in this repository,但很快:
grammar <garammarName>
// Parser Rules
statement: EndOfStatment;
statement_list: statement+;
section:
'<firstKeyword>' statement_list End
| '<secondKeyword>' statement_list End
| '<thirdKeyword>' statement_list End
sections: section+ EOF;
// Lexer Rules
End: 'End';
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
EndOfStatment: ';' | NewLine;
但是,问题是当代码块未以 End
关键字结束时,TestRig / grun 工具 (<exampleFile>
:
<firstKeyword>
End
<secondKeyword>
<thirdKeyword>
End
return
没有任何错误grun <garammarName> sections -tree < <exampleFile>
如果您能帮助我了解问题以及解决方法,我将不胜感激。
当我 运行 输入类似于您在此处给出的内容时,我得到:
➜ grun ElmerSolver sections -tree < examples/ex001.sif
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)
第 6 行缺少 'End' 时特别出错。(line 6:0 missing 'End' at 'Equation'
)
ANTLR 错误恢复确实提供了缺失的 'End' 来恢复并继续解析,但它指出了错误。
作为参考,这是我使用的完整语法:
grammar ElmerSolver;
// Parser Rules
// eostmt: ';' | CR;
statement: EndOfStatment;
statement_list: statement*;
sections: section+ EOF;
// section: SectionName /* statement_list */ End;
// Lexer Rules
fragment DIGIT: [0-9];
Integer: DIGIT+;
Float:
[+-]? (DIGIT+ ([.]DIGIT*)? | [.]DIGIT+) ([Ee][+-]? DIGIT+)?;
section:
'Header' statement_list End # headerSection
| 'Simulation' statement_list End # simulatorSection
| 'Constants' statement_list End # constantsSection
| 'Body' Integer statement_list End # bodySection
| 'Material' Integer statement_list End # materialSection
| 'Body Force' Integer statement_list End # bodyForceSection
| 'Equation' Integer statement_list End # equationSection
| 'Solver' Integer statement_list End # solverSection
| 'Boundary Condition' Integer statement_list End # boundaryConditionSection
| 'Initial Condition' Integer statement_list End # initialConditionSection
| 'Component' Integer statement_list End # componentSection;
End: 'End';
// statementEnd: ';' NewLine*;
NewLine: ('\r'? '\n' | '\n' | '\r') -> skip;
LineJoining:
'\' WhiteSpace? ('\r'? '\n' | '\r' | '\f') -> skip;
WhiteSpace: [ \t\r\n]+ -> skip;
LineComment: '#' ~( '\r' | '\n')* -> skip;
EndOfStatment: ';' | NewLine;
((我对 EndOfStatement
Lexer 规则进行了更改)
这是我使用的输入文件:
Simulation
End
Constants
Equation 1
End
这是我使用 -gui
g运行 选项得到的图形视图;
回复:您对 EndOfStatment
规则的更改。
EndOfStatment 应该是解析器规则(小写)。
此外,按照您的语法,'\n' 将始终被识别为 NewLine
标记,并使用 -> skip
规则将其排除在 tokenStream 之外。
运行 grun
带有 -tokens
选项,您将看不到 EndOfStatement
标记。 (除非你在你的源文件中放了一个';'。)
➜ grun ElmerSolver sections -tree -tokens < examples/ex001.sif
[@0,0:9='Simulation',<'Simulation'>,1:0]
[@1,11:13='End',<'End'>,2:0]
[@2,16:24='Constants',<'Constants'>,4:0]
[@3,28:35='Equation',<'Equation'>,6:0]
[@4,37:37='1',<Integer>,6:9]
[@5,39:41='End',<'End'>,7:0]
[@6,42:41='<EOF>',<EOF>,7:3]
line 6:0 missing 'End' at 'Equation'
(sections (section Simulation statement_list End) (section Constants statement_list <missing 'End'>) (section Equation 1 statement_list End) <EOF>)
如果您希望换行符在语法上有意义(即您可以在语法中使用它),则需要删除 -> skip
.
但是,一旦你这样做了,你就必须具体说明换行符有效的所有地方(但我看到了你的 LineJoining 标记,所以它看起来应该有一点 Python 感觉,所以这可能就是你想要的)。 (同样的评论回复:-> skip
适用于此)。如果您走“Python-like”路线,请了解 Pythongs EOL 和缩进处理是众所周知的解析器添加(并且“The Definitive ANTLR 4 Reference”有一个部分专门介绍必须完成处理)。您还可以在 ANTLR Python grammar