JavaCC语法——解析一切到文件末尾
JavaCC grammar - parse everything to the end of the file
我正在使用 FeatureBNF (and so in essence I am using JavaCC) to try and write a grammar that will produce a (very) simple parser to parse Gherkin 个文件。
Gherkin 文件示例:
Feature: Calculator
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
首先,我想做的就是将其解析为名称为 Calculator
的 Feature
和 Body
,这是其余部分的全部文件。
然而,我一直在努力将文件的其余部分读入 Body
。我认为可能部分是因为没有 'natural' 分隔符表示一个部分何时结束 - 它由换行符表示。
正在尝试以下语法:
<DEFAULT> TOKEN :
{
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027","\u0041"-"\u005a","\u005f","\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") >
| <TEXT : ~[] >
}
GRAMMARSTART
Feature :
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>
;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;
我收到错误:
[java] java.lang.reflect.InvocationTargetException
... lots of stack trace removed...
[java] Caused by: cide.gparser.ParseException: Encountered "\r\n" (5) at line 2, column 1.
[java] Was expecting one of:
[java] <EOF>
[java] <TEXT> ...
我已经能够通过在 Gherkin 文件中添加一些定界符并使用词法状态来实现我想要的,如下所示:
Feature: Calculator #TITLEEND
#BODYSTART
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
#BODYEND
与语法相关部分如下:
<DEFAULT, IN_BODY> SPECIAL_TOKEN : {
" " | "\t" | "\n" | "\r" | "\f"
}
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
}
<DEFAULT> TOKEN : { <BODYSTART : "#BODYSTART"> : IN_BODY }
<IN_BODY> TOKEN : { <TEXT : ~[] > }
<IN_BODY> TOKEN : { <BODYEND : "#BODYEND"> : DEFAULT }
GRAMMARSTART
Feature:
<FEATURE> FeatureName <ENDFEATURETITLE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: <BODYSTART> Text <BODYEND>;
Text: (<TEXT>)*;
但我确定我一定遗漏了一些东西,并且希望能够在不必注释功能文件的情况下实现这一点。执行此操作的更好方法是什么?
边注
FeatureBNF 建立在 JavaCC 之上,并输出一个语法文件供 JavaCC 处理。我对 FeatureBNF 和 JavaCC 都是全新的,但它们看起来非常相似,我希望这个问题可能适用于 JavaCC 专家。 (FeatureBNF 使用 JavaCC 语法作为词法规范,然后使用自己的格式作为语法的生产规则。)
根据您的语法,您可以在第一个换行符后切换状态,因此以下词法语法就足够了:
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") > : IN_BODY
}
<IN_BODY> TOKEN : { <TEXT : ~[] > }
现在句法文法是
Feature:
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;
我正在使用 FeatureBNF (and so in essence I am using JavaCC) to try and write a grammar that will produce a (very) simple parser to parse Gherkin 个文件。
Gherkin 文件示例:
Feature: Calculator
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
首先,我想做的就是将其解析为名称为 Calculator
的 Feature
和 Body
,这是其余部分的全部文件。
然而,我一直在努力将文件的其余部分读入 Body
。我认为可能部分是因为没有 'natural' 分隔符表示一个部分何时结束 - 它由换行符表示。
正在尝试以下语法:
<DEFAULT> TOKEN :
{
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027","\u0041"-"\u005a","\u005f","\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") >
| <TEXT : ~[] >
}
GRAMMARSTART
Feature :
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>
;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;
我收到错误:
[java] java.lang.reflect.InvocationTargetException
... lots of stack trace removed...
[java] Caused by: cide.gparser.ParseException: Encountered "\r\n" (5) at line 2, column 1.
[java] Was expecting one of:
[java] <EOF>
[java] <TEXT> ...
我已经能够通过在 Gherkin 文件中添加一些定界符并使用词法状态来实现我想要的,如下所示:
Feature: Calculator #TITLEEND
#BODYSTART
In order to avoid silly mistakes
As a math idiot
I want to be told the sum of two numbers
Scenario: Add two numbers
Given I have entered 50 into the calculator
And I have also entered 70 into the calculator
When I press add
Then the result should be 120 on the screen
#BODYEND
与语法相关部分如下:
<DEFAULT, IN_BODY> SPECIAL_TOKEN : {
" " | "\t" | "\n" | "\r" | "\f"
}
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
}
<DEFAULT> TOKEN : { <BODYSTART : "#BODYSTART"> : IN_BODY }
<IN_BODY> TOKEN : { <TEXT : ~[] > }
<IN_BODY> TOKEN : { <BODYEND : "#BODYEND"> : DEFAULT }
GRAMMARSTART
Feature:
<FEATURE> FeatureName <ENDFEATURETITLE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: <BODYSTART> Text <BODYEND>;
Text: (<TEXT>)*;
但我确定我一定遗漏了一些东西,并且希望能够在不必注释功能文件的情况下实现这一点。执行此操作的更好方法是什么?
边注
FeatureBNF 建立在 JavaCC 之上,并输出一个语法文件供 JavaCC 处理。我对 FeatureBNF 和 JavaCC 都是全新的,但它们看起来非常相似,我希望这个问题可能适用于 JavaCC 专家。 (FeatureBNF 使用 JavaCC 语法作为词法规范,然后使用自己的格式作为语法的生产规则。)
根据您的语法,您可以在第一个换行符后切换状态,因此以下词法语法就足够了:
<DEFAULT> TOKEN : {
<FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") > : IN_BODY
}
<IN_BODY> TOKEN : { <TEXT : ~[] > }
现在句法文法是
Feature:
<FEATURE> FeatureName <NEWLINE>
Body
<EOF>;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;