JavaCC语法——解析一切到文件末尾

JavaCC grammar - parse everything to the end of the file

我正在使用 FeatureBNF (and so in essence I am using JavaCC) to try and write a grammar that will produce a (very) simple parser to parse Gherkin 个文件。

Gherkin 文件示例:

Feature: Calculator

   In order to avoid silly mistakes
   As a math idiot
   I want to be told the sum of two numbers

Scenario: Add two numbers
   Given I have entered 50 into the calculator
   And I have also entered 70 into the calculator
   When I press add
   Then the result should be 120 on the screen

首先,我想做的就是将其解析为名称为 CalculatorFeatureBody,这是其余部分的全部文件。

然而,我一直在努力将文件的其余部分读入 Body。我认为可能部分是因为没有 'natural' 分隔符表示一个部分何时结束 - 它由换行符表示。

正在尝试以下语法:

<DEFAULT> TOKEN :
{
  <FEATURE: "Feature: " >
| <#LETTER: ["\u0027","\u0041"-"\u005a","\u005f","\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") >
| <TEXT : ~[] >
}

GRAMMARSTART

Feature :
    <FEATURE> FeatureName <NEWLINE>
    Body
    <EOF>
    ;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;

我收到错误:

[java] java.lang.reflect.InvocationTargetException

... lots of stack trace removed...

[java] Caused by: cide.gparser.ParseException: Encountered "\r\n" (5) at line 2, column 1.

[java] Was expecting one of:

[java] <EOF>

[java] <TEXT> ...

我已经能够通过在 Gherkin 文件中添加一些定界符并使用词法状态来实现我想要的,如下所示:

Feature: Calculator #TITLEEND
#BODYSTART
   In order to avoid silly mistakes
   As a math idiot
   I want to be told the sum of two numbers

Scenario: Add two numbers
   Given I have entered 50 into the calculator
   And I have also entered 70 into the calculator
   When I press add
   Then the result should be 120 on the screen
#BODYEND

与语法相关部分如下:

<DEFAULT, IN_BODY> SPECIAL_TOKEN : {
  " " | "\t" | "\n" | "\r" | "\f"
}

<DEFAULT> TOKEN : {
  <FEATURE: "Feature: " >
| <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
| <FEATURE_NAME: (<LETTER>)+ >
| <ENDFEATURETITLE: "#TITLEEND" >
}

<DEFAULT> TOKEN : { <BODYSTART : "#BODYSTART"> : IN_BODY }
<IN_BODY> TOKEN : { <TEXT : ~[] > }

<IN_BODY> TOKEN : { <BODYEND : "#BODYEND"> : DEFAULT } 

GRAMMARSTART

Feature:
    <FEATURE> FeatureName <ENDFEATURETITLE>
    Body
    <EOF>;
FeatureName: <FEATURE_NAME>;
Body: <BODYSTART> Text <BODYEND>;
Text: (<TEXT>)*;

但我确定我一定遗漏了一些东西,并且希望能够在不必注释功能文件的情况下实现这一点。执行此操作的更好方法是什么?


边注

FeatureBNF 建立在 JavaCC 之上,并输出一个语法文件供 JavaCC 处理。我对 FeatureBNF 和 JavaCC 都是全新的,但它们看起来非常相似,我希望这个问题可能适用于 JavaCC 专家。 (FeatureBNF 使用 JavaCC 语法作为词法规范,然后使用自己的格式作为语法的生产规则。)


根据您的语法,您可以在第一个换行符后切换状态,因此以下词法语法就足够了:

<DEFAULT> TOKEN : {
   <FEATURE: "Feature: " >
   | <#LETTER: ["\u0027", "\u0041"-"\u005a", "\u005f", "\u0061"-"\u007a"] >
   | <FEATURE_NAME: (<LETTER>)+ >
   | <ENDFEATURETITLE: "#TITLEEND" >
   | <NEWLINE: ("\r\n" | "\n\r" | "\r" | "\n") > : IN_BODY
   }

<IN_BODY> TOKEN : { <TEXT : ~[] > }

现在句法文法是

Feature:
    <FEATURE> FeatureName <NEWLINE>
    Body
    <EOF>;
FeatureName: <FEATURE_NAME>;
Body: (<TEXT>)*;