为表达式生成解析器

Generating a parser for expressions

我正在尝试为 Microsoft article 中描述的 ebnf 语法生成一个 javascript 解析器。文章中指定的 ebnf 当我按照它写的那样使用时不起作用,所以我试图简化它以使其与 REx 解析器生成器一起使用。

Javascript 的目标是能够解析这些表达式并将其评估为 True 或 False:

我在这里在线使用 REx 解析器生成器:https://bottlecaps.de/rex/。如果您对其他生成 JavaScript 的解析器生成器有建议,我将不胜感激一些指向我可以找到它们的位置的链接。

我正在努力解决的问题是 MethodCall 的定义。我尝试了很多不同的定义,但都失败了。当我删除 MethodCall 和 MethodArgs 定义时,REx 解析器生成器会生成一个解析器。

所以,如果能帮助解决这个问题,我将不胜感激。

以下是我所能得到的语法。

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number
BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

Identifier
    ::= Letter ( Letter | Digit | "_" )*
Number
    ::= Digit ('.' Digit) | (Digit)*

<?TOKENS?>

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

这是我尝试过但都失败的 MethodCall 定义的一些不同版本。

MethodCall
    ::= Identifier "(" MethodArgs? ")"
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= ( Expression ("," MethodArgs)* )?

MethodCall
    ::= MethodName "(" MethodArgs? ")"
MethodName
    ::= Identifier
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression ("," MethodArgs)* | ""

MethodCall
    ::= Identifier MethodArgs
MethodArgs
    ::= "(" (Expression ("," MethodArgs)* | "")  ")"

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs | ""

MethodCall
    ::= Identifier "(" Expression ")"

我试图从许多其他语言中获得灵感,看看它们是如何做到的,但到目前为止还没有成功:

更新

只是想让你知道结果如何。我努力让 EBNF 语法来做我需要它做的事情,所以我看了一下 Nearley like @rici suggested and converted my grammar into the Nearley grammar syntax and got the tooling to work. It was just a much better choice for my project. The documentation is very good, the tooling is also great and the error messages is very helpful. So a huge thanks to @rici 以推荐 Nearley。

下面是我实现的语法。我已经使用以下输入进行了测试:

'2 + 4', '2 + 4 - 6', '(2 + 4)', '!true', '!(true)', 'hasCategory(test)', 'hasCategory(test,test2)', 'hasCategory( test , test2 )', 'hasCategory(test,test2, test3)', 'IsReference', 'IsReference()', '2 * 4', '(2 / 4)', '2 * 4 + 2', '(2 / 4) + 2', '2 > 4', '2 >= 2', '2 = 4', '2 == 2', '2 != 4', '2 !== 2', '(2 * 4 + 2) > 4', '(2 * 4 + 2) > (4 + 10)', 'true', 'true or false', 'true || false', 'true and false', 'true && false', '(true or false)', '!(true or false)', '2 != 1+1', '2 != (1+1)', '2 != (1+2)', '(2 > 2 or (2 != 1+1))',

@builtin "whitespace.ne" # `_` means arbitrary amount of whitespace
@builtin "number.ne"     # `int`, `decimal`, and `percentage` number primitives
@builtin "postprocessors.ne"

@{%
function methodCall(nameId, argsId = -1) {
  return function(data) {
      return {
          type: 'methodCall',
          name: data[nameId],
          args: argsId == -1 ? [] : data[argsId]
      };
    }
}

function value() {
  return function(data) {
      return {
          type: 'value',
          value: data[0]
      };
    }
}
%}
expression -> 
    methodCall {% id %}
  | relationalExpression {% value() %}
  | booleanExpression  {% value() %}
  | _ identifier _  {% methodCall(1) %}

booleanExpression ->
      parentheses {% id %}
    | parentheses _ "and"i _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "&&" _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "or"i _ parentheses {% d => d[0] || d[4] %}
    | parentheses _ "||" _ parentheses {% d => d[0] || d[4] %}

parentheses ->
    _ "(" relationalExpression ")" _ {% nth(2) %}
  |  _ "(" booleanExpression ")" _ {% nth(2) %}
  | unaryExpression {% id %}

relationalExpression -> 
      _ additiveExpression _ {% nth(1) %}
    | relationalExpression _ "=" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "==" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "!=" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "!==" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "<" _ additiveExpression {% d => d[0] < d[4] %}
    | relationalExpression _ ">" _ additiveExpression {% d => d[0] > d[4] %}
    | relationalExpression _ "<=" _ additiveExpression {% d => d[0] <= d[4] %}
    | relationalExpression _ ">=" _ additiveExpression {% d => d[0] >= d[4] %}

additiveExpression -> 
    _ multiplicativeExpression _ {% nth(1) %}
  | additiveExpression _ "+" _ multiplicativeExpression {% d => d[0] + d[4] %}
  | additiveExpression _ "-" _ multiplicativeExpression {% d => d[0] - d[4] %}

multiplicativeExpression ->
    _ parentheses _  {% nth(1) %}
  | parentheses _ "*" _ parentheses {% d => d[0] * d[4] %}
  | parentheses _ "/" _ parentheses {% d => d[0] / d[4] %}
  | parentheses _ "%" _ parentheses {% d => d[0] % d[4] %}

unaryExpression ->  
    _ "!" _ expression _ {% d => !d[3] %}
  | _ decimal _ {% nth(1) %}
  | _ unsigned_int _ {% nth(1) %}
  | _ boolean _ {% nth(1) %}
  | _ identifier _ {% nth(1) %}

methodCall -> 
      identifier "(" methodArgs ")" {% methodCall(0, 2) %}
    | identifier "(" _ ")" {% methodCall(0) %}

methodArgs ->
    _ identifier _  {% d => [d[1]] %}
  | _ identifier _ "," _ methodArgs _ {% d => [d[1]].concat(d[5]) %}

boolean ->
    "true"i {% () => true %} 
  | "false"i {% () => false %}

identifier -> 
  [A-Za-z0-9_]:+ {% (data, l, reject) => {
    var ident = data[0].join('');
    if (ident.toLowerCase() === 'true' || ident.toLowerCase() === 'false') {
      return reject;
    } else {
      return ident;
    }
  }
   %}

你的语法有一些问题,但基本上没问题。

  • 'and' 和 'or' 与 Identifier 冲突。因此,从其规则中的 Identifier 中减去那些字符串文字。
  • Number 缺少括号。应该是Number ::= Digit ( ('.' Digit) | (Digit)* )
  • 您缺少 EOF 规则。我知道的几乎每个解析器生成器都需要一个 bottom/EOF 规则来强制消耗整个输入。我添加了规则“输入”。
  • 确保单击“配置”框,然后单击“回溯”。你的语法有歧义,这很好,但需要你告诉解析器生成器来处理它。

解析器生成器对“EBNF”的语法略有不同,这是 REx 采用的语法。 REx 添加了一个 <?TOKENS?> 字符串来表示解析器和词法分析器规则之间的边界。微软说语法是“BNF”,但这不是因为它使用了 Kleene 运算符 <Identifier> ::= [^. ]*,一种 EBNF 结构。它还用散文捏造了 <Literal><Number> 的定义。

我没有测试生成的解析器,但它似乎是一个简单的递归下降实现。 conversion 页面列出了我熟悉且流行的解析器生成器。 (我正在为所有这些以及更多编写转换器。)

试试这个:

Input ::= Expression EOF

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number

BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

<?TOKENS?>

Identifier
    ::= ( ( Letter ( Letter | Digit | "_" )* ) - ( 'and' | 'or' ) )
Number ::= Digit ( ('.' Digit) | (Digit)* )

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

EOF      ::= $