使用递归下降解析器验证 "break" 语句
Validating a "break" statement with a recursive descent parser
在 Crafting Interpreters 中,我们使用递归下降解析器实现了一种小型编程语言。在许多其他事情中,它有这些声明:
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
练习之一是add a break
statement语言。此外,将此语句置于循环之外应该是语法错误。当然,它可以出现在其他块内,if
语句等,如果它们在循环内。
我的第一个方法是创建一个新规则 whileBody
,以接受 break
:
## FIRST TRY
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
| break ;
break → "break" ";" ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
但是我们必须在嵌套循环中接受 break
,if
条件等。我能想象的是,我需要一个新的规则来接受 break
:
## SECOND TRY
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
| break
| whileBlock
| whileIfStmt
whileBlock→ "{" (declaration | break)* "}" ;
whileIfStmt → "if" "(" expression ")" whileBody ( "else" whileBody )? ;
break → "break" ";"
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
目前并非不可行,但一旦语言发展起来,处理它会很麻烦。写到今天还是枯燥易错!
我在 C and Java BNF specifications. Apparently, none of those specifications prohibit the break
outside loop. I guess their parsers have ad hoc code to prevent that. So, I followed suit and added code into the parser to prevent break
outside loops 中寻找灵感。
TL;DR
我的问题是:
- 我第二次尝试的方法是否有效?换句话说,递归下降解析器可以处理只出现在循环内的
break
语句吗?
- 是否有更实用的方法在语法规范中烘焙
break
命令?
- 或者标准方法确实是更改解析器以防止在解析时在循环外中断?
- Would the approach of my second try even work? In other words, could a recursive descent parser handle a
break
statement that only appears inside loops?
当然可以。但是你需要大量的重复。由于 while
不是唯一的循环结构,我使用了一种不同的方式来描述备选方案,其中包括将 _B
添加到可能包含 break
语句的非终结符的名称中.
declaration → varDecl
| statement
declaration_B → varDecl
| statement_B
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block
statement_B → exprStmt
| printStmt
| whileStmt
| breakStmt
| ifStmt_B
| block_B
breakStmt → "break" ";"
ifStmt → "if" "(" expression ")" statement ( "else" statement )?
ifStmt_B → "if" "(" expression ")" statement_B ( "else" statement_B )?
whileStmt → "while" "(" expression ")" statement_B ;
block → "{" declaration* "}"
block_B → "{" declaration_B* "}"
并非所有语句类型都需要复制。像 exprStmt
这样的非复合语句不需要,因为它们不可能包含 break
语句(或任何其他语句类型)。 statement
是循环语句的目标,如 whileStmt
可以始终包含 break
,无论 while
是否在循环内。
- Is there a more practical way to bake the break command inside a syntax specification?
除非您的语法规范具有标记宏,例如用于描述 ECMAScript 的规范。
- Is there a different way to do this?
由于这是一个自上而下的(递归下降)解析器,因此在解析器的执行过程中处理这种情况非常简单。您只需要向每个(或许多)解析函数添加一个参数,指定是否可以中断。由 whileStmt
调用的任何解析函数都会将该参数设置为 True
(或表明可以中断的枚举),而其他语句类型只会传递参数,顶级解析函数会将参数设置为 False
。如果使用 False
.
调用,breakStmt
实现只会 return 失败
属性语法擅长这种事情。定义一个继承的属性(我称它为 LC,表示循环计数)。 'program' non-terminal 将 LC = 0 传递给它的 children;循环将 LC = $LC + 1 传递给它们的 children;所有其他构造将 LC = $LC 传递给它们的 children。仅当 $LC > 0.
时,使 'break' 的规则在语法上有效
属性语法或在守卫中使用属性值没有标准语法(正如我建议 'break'),但使用 Prolog definite-clause 语法符号你的语法可能看起来像下列。我已经添加了一些关于 DCG 符号的注释,以防你使用它们已经太久了。
/* nt(X) means, roughly, pass the value X as an inherited attribute.
** In a recursive-descent system, it can be passed as a parameter.
** N.B. in definite-clause grammars, semicolon separates alternatives,
** and full stop ends a rule.
*/
/* DCD doesn't have regular-right-part rules, so we have to
** handle repetition via recursion.
*/
program -->
statement(0);
statement(0), program.
statement(LC) -->
exprStmt(LC);
ifStmt(LC);
printStmt(LC);
whileStmt(LC);
block(LC);
break(LC).
block(LC) -->
"{", star-declaration(LC), "}".
/* The notation [] denotes the empty list, and matches zero
** tokens in the input.
*/
star-declaration(LC) -->
[];
declaration(LC), star-declaration(LC).
/* On the RHS of a rule, braces { ... } contain Prolog code. Here,
** the code "LC2 is LC + 1" adds 1 to LC and binds LC2 to that value.
*/
whileStmt(LC) -->
{ LC2 is LC + 1 }, "while", "(", expression(LC2), ")", statement(LC2).
ifStmt(LC) --> "if", "(", expression(LC), ")", statement(LC), opt-else(LC).
opt-else(LC) -->
"else", statement(LC);
[].
/* The definition of break checks the value of the loop count:
** "LC > 0" succeeds if LC is greater than zero, and allows the
** parse to succeed. If LC is not greater than zero, the expression
** fails. And since there is no other rule for 'break', any attempt
** to parse a 'break' rule when LC = 0 will fail.
*/
break(LC) --> { LC > 0 }, "break", ";".
可以在 Grune 和 Jacobs 的 Parsing Techniques 和 Springer volumes Lecture Notes in Computer Science 461(Attribute Grammars and Their Applications*, ed . P. Deransart 和 M. Jourdan)和 545(属性语法、应用程序和系统,编者 H. Alblas 和 B. Melichar。
复制一些产生式以区分两种情况(我在循环中吗?或不在循环中?)的技术,如@rici 的答案所示,可以被视为一种将布尔属性推入的方法non-terminal 个名字。
在 Crafting Interpreters 中,我们使用递归下降解析器实现了一种小型编程语言。在许多其他事情中,它有这些声明:
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" statement ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
练习之一是add a break
statement语言。此外,将此语句置于循环之外应该是语法错误。当然,它可以出现在其他块内,if
语句等,如果它们在循环内。
我的第一个方法是创建一个新规则 whileBody
,以接受 break
:
## FIRST TRY
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
| break ;
break → "break" ";" ;
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
但是我们必须在嵌套循环中接受 break
,if
条件等。我能想象的是,我需要一个新的规则来接受 break
:
## SECOND TRY
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block ;
block → "{" declaration* "}" ;
whileStmt → "while" "(" expression ")" whileBody ;
whileBody → statement
| break
| whileBlock
| whileIfStmt
whileBlock→ "{" (declaration | break)* "}" ;
whileIfStmt → "if" "(" expression ")" whileBody ( "else" whileBody )? ;
break → "break" ";"
ifStmt → "if" "(" expression ")" statement ( "else" statement )? ;
目前并非不可行,但一旦语言发展起来,处理它会很麻烦。写到今天还是枯燥易错!
我在 C and Java BNF specifications. Apparently, none of those specifications prohibit the break
outside loop. I guess their parsers have ad hoc code to prevent that. So, I followed suit and added code into the parser to prevent break
outside loops 中寻找灵感。
TL;DR
我的问题是:
- 我第二次尝试的方法是否有效?换句话说,递归下降解析器可以处理只出现在循环内的
break
语句吗? - 是否有更实用的方法在语法规范中烘焙
break
命令? - 或者标准方法确实是更改解析器以防止在解析时在循环外中断?
- Would the approach of my second try even work? In other words, could a recursive descent parser handle a
break
statement that only appears inside loops?
当然可以。但是你需要大量的重复。由于 while
不是唯一的循环结构,我使用了一种不同的方式来描述备选方案,其中包括将 _B
添加到可能包含 break
语句的非终结符的名称中.
declaration → varDecl
| statement
declaration_B → varDecl
| statement_B
statement → exprStmt
| ifStmt
| printStmt
| whileStmt
| block
statement_B → exprStmt
| printStmt
| whileStmt
| breakStmt
| ifStmt_B
| block_B
breakStmt → "break" ";"
ifStmt → "if" "(" expression ")" statement ( "else" statement )?
ifStmt_B → "if" "(" expression ")" statement_B ( "else" statement_B )?
whileStmt → "while" "(" expression ")" statement_B ;
block → "{" declaration* "}"
block_B → "{" declaration_B* "}"
并非所有语句类型都需要复制。像 exprStmt
这样的非复合语句不需要,因为它们不可能包含 break
语句(或任何其他语句类型)。 statement
是循环语句的目标,如 whileStmt
可以始终包含 break
,无论 while
是否在循环内。
- Is there a more practical way to bake the break command inside a syntax specification?
除非您的语法规范具有标记宏,例如用于描述 ECMAScript 的规范。
- Is there a different way to do this?
由于这是一个自上而下的(递归下降)解析器,因此在解析器的执行过程中处理这种情况非常简单。您只需要向每个(或许多)解析函数添加一个参数,指定是否可以中断。由 whileStmt
调用的任何解析函数都会将该参数设置为 True
(或表明可以中断的枚举),而其他语句类型只会传递参数,顶级解析函数会将参数设置为 False
。如果使用 False
.
breakStmt
实现只会 return 失败
属性语法擅长这种事情。定义一个继承的属性(我称它为 LC,表示循环计数)。 'program' non-terminal 将 LC = 0 传递给它的 children;循环将 LC = $LC + 1 传递给它们的 children;所有其他构造将 LC = $LC 传递给它们的 children。仅当 $LC > 0.
时,使 'break' 的规则在语法上有效属性语法或在守卫中使用属性值没有标准语法(正如我建议 'break'),但使用 Prolog definite-clause 语法符号你的语法可能看起来像下列。我已经添加了一些关于 DCG 符号的注释,以防你使用它们已经太久了。
/* nt(X) means, roughly, pass the value X as an inherited attribute.
** In a recursive-descent system, it can be passed as a parameter.
** N.B. in definite-clause grammars, semicolon separates alternatives,
** and full stop ends a rule.
*/
/* DCD doesn't have regular-right-part rules, so we have to
** handle repetition via recursion.
*/
program -->
statement(0);
statement(0), program.
statement(LC) -->
exprStmt(LC);
ifStmt(LC);
printStmt(LC);
whileStmt(LC);
block(LC);
break(LC).
block(LC) -->
"{", star-declaration(LC), "}".
/* The notation [] denotes the empty list, and matches zero
** tokens in the input.
*/
star-declaration(LC) -->
[];
declaration(LC), star-declaration(LC).
/* On the RHS of a rule, braces { ... } contain Prolog code. Here,
** the code "LC2 is LC + 1" adds 1 to LC and binds LC2 to that value.
*/
whileStmt(LC) -->
{ LC2 is LC + 1 }, "while", "(", expression(LC2), ")", statement(LC2).
ifStmt(LC) --> "if", "(", expression(LC), ")", statement(LC), opt-else(LC).
opt-else(LC) -->
"else", statement(LC);
[].
/* The definition of break checks the value of the loop count:
** "LC > 0" succeeds if LC is greater than zero, and allows the
** parse to succeed. If LC is not greater than zero, the expression
** fails. And since there is no other rule for 'break', any attempt
** to parse a 'break' rule when LC = 0 will fail.
*/
break(LC) --> { LC > 0 }, "break", ";".
可以在 Grune 和 Jacobs 的 Parsing Techniques 和 Springer volumes Lecture Notes in Computer Science 461(Attribute Grammars and Their Applications*, ed . P. Deransart 和 M. Jourdan)和 545(属性语法、应用程序和系统,编者 H. Alblas 和 B. Melichar。
复制一些产生式以区分两种情况(我在循环中吗?或不在循环中?)的技术,如@rici 的答案所示,可以被视为一种将布尔属性推入的方法non-terminal 个名字。