在 EOS 处停止 Raku 语法(字符串结尾)
Stopping Raku grammar at EOS (End of String)
在将一种音乐语言翻译成另一种音乐语言(ABC 到 Alda)作为学习 Raku DSL 能力的借口的过程中,我注意到似乎没有办法终止 .parse
!这是我缩短的演示代码:
#!/home/hsmyers/rakudo741/bin/perl6
use v6d;
# use Grammar::Debugger;
use Grammar::Tracer;
my $test-n01 = q:to/EOS/;
a b c d e f g
A B C D E F G
EOS
grammar test {
token TOP { <score>+ }
token score {
<.ws>?
[
| <uc>
| <lc>
]+
<.ws>?
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
test.parse($test-n01).say;
Grammer::Tracer 显示的最后一部分说明了我的问题。
| score
| | uc
| | * MATCH "G"
| * MATCH "G\n"
| score
| * FAIL
* MATCH "a b c d e f g\nA B C D E F G\n"
「a b c d e f g
A B C D E F G
」
在倒数第二行,FAIL 一词告诉我 .parse 运行 无法退出。我想知道这是否正确? .say 显示了应有的一切,所以我不清楚 FAIL 的真实性如何?问题依旧,"How do I correctly write a grammar that parses multiple lines without error?"
当您使用语法调试器时,它可以让您准确地看到引擎是如何解析字符串的——失败是正常的,也是意料之中的。例如,考虑将 a+b*
与字符串 aab
匹配。您应该得到 'a' 的两个匹配项,然后是失败(因为 b
不是 a
),然后它会用 b
重试并成功匹配。
如果您使用 ||
进行交替(强制执行顺序),这可能会更容易看到。如果你有
token TOP { I have a <fruit> }
token fruit { apple || orange || kiwi }
然后你解析句子 "I have a kiwi",你会看到它首先匹配 "I have a",然后是两个失败的 "apple" 和 "orange",最后是一个匹配"kiwi".
现在让我们看看您的案例:
TOP # Trying to match top (need >1 match of score)
| score # Trying to match score (need >1 match of lc/uc)
| | lc # Trying to match lc
| | * MATCH "a" # lc had a successful match! ("a")
| * MATCH "a " # and as a result so did score! ("a ")
| score # Trying to match score again (because <score>+)
| | lc # Trying to match lc
| | * MATCH "b" # lc had a successful match! ("b")
| * MATCH "b " # and as a result so did score! ("b ")
…………… # …so forth and so on until…
| score # Trying to match score again (because <score>+)
| | uc # Trying to match uc
| | * MATCH "G" # uc had a successful match! ("G")
| * MATCH "G\n" # and as a result, so did score! ("G\n")
| score # Trying to match *score* again (because <score>+)
| * FAIL # failed to match score, because no lc/uc.
|
| # <-------------- At this point, the question is, did TOP match?
| # Remember, TOP is <score>+, so we match TOP if there
| # was at least one <score> token that matched, there was so...
|
* MATCH "a b c d e f g\nA B C D E F G\n" # this is the TOP match
这里的失败是正常的:在某些时候我们会 运行 出 <score>
个令牌,所以失败是不可避免的。发生这种情况时,语法引擎可以继续处理语法中 <score>+
之后的任何内容。由于什么都没有,所以失败实际上会导致整个字符串的匹配(因为 TOP
与隐式 /^…$/
匹配)。
此外,您可以考虑使用自动插入 <.ws>* 的规则重写语法(除非重要的是它只能是一个 space):
grammar test {
rule TOP { <score>+ }
token score {
[
| <uc>
| <lc>
]+
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
此外,IME,您可能还想为 uc/lc 添加一个原型令牌,因为当您有 [ <foo> | <bar> ]
时,您将始终有其中之一未定义,这可以使处理他们的行动 class 有点烦人。你可以试试:
grammar test {
rule TOP { <score> + }
token score { <letter> + }
proto token letter { * }
token letter:uc { <[A..G]> }
token letter:lc { <[a..g]> }
}
$<letter>
将始终以这种方式定义。
在将一种音乐语言翻译成另一种音乐语言(ABC 到 Alda)作为学习 Raku DSL 能力的借口的过程中,我注意到似乎没有办法终止 .parse
!这是我缩短的演示代码:
#!/home/hsmyers/rakudo741/bin/perl6
use v6d;
# use Grammar::Debugger;
use Grammar::Tracer;
my $test-n01 = q:to/EOS/;
a b c d e f g
A B C D E F G
EOS
grammar test {
token TOP { <score>+ }
token score {
<.ws>?
[
| <uc>
| <lc>
]+
<.ws>?
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
test.parse($test-n01).say;
Grammer::Tracer 显示的最后一部分说明了我的问题。
| score
| | uc
| | * MATCH "G"
| * MATCH "G\n"
| score
| * FAIL
* MATCH "a b c d e f g\nA B C D E F G\n"
「a b c d e f g
A B C D E F G
」
在倒数第二行,FAIL 一词告诉我 .parse 运行 无法退出。我想知道这是否正确? .say 显示了应有的一切,所以我不清楚 FAIL 的真实性如何?问题依旧,"How do I correctly write a grammar that parses multiple lines without error?"
当您使用语法调试器时,它可以让您准确地看到引擎是如何解析字符串的——失败是正常的,也是意料之中的。例如,考虑将 a+b*
与字符串 aab
匹配。您应该得到 'a' 的两个匹配项,然后是失败(因为 b
不是 a
),然后它会用 b
重试并成功匹配。
如果您使用 ||
进行交替(强制执行顺序),这可能会更容易看到。如果你有
token TOP { I have a <fruit> }
token fruit { apple || orange || kiwi }
然后你解析句子 "I have a kiwi",你会看到它首先匹配 "I have a",然后是两个失败的 "apple" 和 "orange",最后是一个匹配"kiwi".
现在让我们看看您的案例:
TOP # Trying to match top (need >1 match of score)
| score # Trying to match score (need >1 match of lc/uc)
| | lc # Trying to match lc
| | * MATCH "a" # lc had a successful match! ("a")
| * MATCH "a " # and as a result so did score! ("a ")
| score # Trying to match score again (because <score>+)
| | lc # Trying to match lc
| | * MATCH "b" # lc had a successful match! ("b")
| * MATCH "b " # and as a result so did score! ("b ")
…………… # …so forth and so on until…
| score # Trying to match score again (because <score>+)
| | uc # Trying to match uc
| | * MATCH "G" # uc had a successful match! ("G")
| * MATCH "G\n" # and as a result, so did score! ("G\n")
| score # Trying to match *score* again (because <score>+)
| * FAIL # failed to match score, because no lc/uc.
|
| # <-------------- At this point, the question is, did TOP match?
| # Remember, TOP is <score>+, so we match TOP if there
| # was at least one <score> token that matched, there was so...
|
* MATCH "a b c d e f g\nA B C D E F G\n" # this is the TOP match
这里的失败是正常的:在某些时候我们会 运行 出 <score>
个令牌,所以失败是不可避免的。发生这种情况时,语法引擎可以继续处理语法中 <score>+
之后的任何内容。由于什么都没有,所以失败实际上会导致整个字符串的匹配(因为 TOP
与隐式 /^…$/
匹配)。
此外,您可以考虑使用自动插入 <.ws>* 的规则重写语法(除非重要的是它只能是一个 space):
grammar test {
rule TOP { <score>+ }
token score {
[
| <uc>
| <lc>
]+
}
token uc { <[A..G]> }
token lc { <[a..g]> }
}
此外,IME,您可能还想为 uc/lc 添加一个原型令牌,因为当您有 [ <foo> | <bar> ]
时,您将始终有其中之一未定义,这可以使处理他们的行动 class 有点烦人。你可以试试:
grammar test {
rule TOP { <score> + }
token score { <letter> + }
proto token letter { * }
token letter:uc { <[A..G]> }
token letter:lc { <[a..g]> }
}
$<letter>
将始终以这种方式定义。