如何使用 BNF 词法分析器识别和提取简单的嵌套标记?
How to identify and extract simple nested tokens with a BNF lexer?
我不知道如何获取有关此的文档。我刚刚发现大多数编译器都使用 Backus–Naur 形式来描述语言。
从 Marpa::R2
perl 包中获取这个解析算术字符串的简单示例,例如 42 * 1 + 7
:
:default ::= action => [name,values]
lexeme default = latm => 1
Calculator ::= Expression action => ::first
Factor ::= Number action => ::first
Term ::=
Term '*' Factor action => do_multiply
| Factor action => ::first
Expression ::=
Expression '+' Term action => do_add
| Term action => ::first
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+
我想修改它以便递归解析 XML 样例,例如:
<foo>
Some content here
<bar>
I am nested into foo
</bar>
A nested block was before me.
</foo>
并将其表达为:
>(Some content here)
>>(I am nested into foo)
>(A nested block was before me)
我可以在哪里使用这个功能:
sub block($content, $level) {
for each $content line
$line = (">" x $level).$content
return $content
}
这对我来说是一个好的开始吗?
有一个开源 Marpa-powered XML parser。
我不知道如何获取有关此的文档。我刚刚发现大多数编译器都使用 Backus–Naur 形式来描述语言。
从 Marpa::R2
perl 包中获取这个解析算术字符串的简单示例,例如 42 * 1 + 7
:
:default ::= action => [name,values]
lexeme default = latm => 1
Calculator ::= Expression action => ::first
Factor ::= Number action => ::first
Term ::=
Term '*' Factor action => do_multiply
| Factor action => ::first
Expression ::=
Expression '+' Term action => do_add
| Term action => ::first
Number ~ digits
digits ~ [\d]+
:discard ~ whitespace
whitespace ~ [\s]+
我想修改它以便递归解析 XML 样例,例如:
<foo>
Some content here
<bar>
I am nested into foo
</bar>
A nested block was before me.
</foo>
并将其表达为:
>(Some content here)
>>(I am nested into foo)
>(A nested block was before me)
我可以在哪里使用这个功能:
sub block($content, $level) {
for each $content line
$line = (">" x $level).$content
return $content
}
这对我来说是一个好的开始吗?
有一个开源 Marpa-powered XML parser。