如何使 Perl 6 语法产生多个匹配项（如 :ex 和 :ov）？

Question

我想grammar做这样的事情：

> "abc" ~~ m:ex/^ (\w ** 1..2) (\w ** 1..2) $ {say [=10=], }/
｢ab｣｢c｣
｢a｣｢bc｣

或者像这样：

> my regex left { \S ** 1..2  }
> my regex right { \S ** 1..2  }
> "abc" ~~ m:ex/^ <left><right> $ {say $<left>, $<right>}/
｢ab｣｢c｣
｢a｣｢bc｣

这是我的 grammar:

grammar LR {
  regex TOP {
    <left> 
    <right>
  }
  regex left {
    \w ** 1..2 
  }
  regex right {
    \w ** 1..2 
  }
}

my $string = "abc";
my $match = LR.parse($string);
say "input: $string";
printf "split: %s|%s\n", ~$match<left>, ~$match<right>;

它的输出是：

$ input: abc
$ split: ab|c

所以，<left>只能贪心，什么都不留给<right>。我应该如何修改代码以匹配两种可能的变体？

$ input: abc
$ split: a|bc, ab|c

Answer 1

我认为 Moritz Lenz，昵称 moritz，即将出版的新书 "Parsing with Perl 6 Regexes and Grammars" 的作者是问这个问题的人。我可能应该让他回答这个问题……

备注

如果有人考虑尝试修改 grammar.parse 以使其支持 :exhaustive，或者以其他方式进行黑客攻击以执行@evb 想要的操作，以下文档可能有用 inspiration/guidance 我从相关的推测文档 (S05) 和搜索 #perl6 和 #perl6-dev irc 日志中收集。

7 年前 莫里茨补充道 an edit of S05:

A [regex] modifier that affects only the calling behaviour, and not the regex itself [eg :exhaustive] may only appear on constructs that involve a call (like m// [or grammar.parse]), and not on rx// [or regex { ... }].

（[例如 :exhaustive]、[或 grammar.parse] 和 [或 regex { ... }] 位是 extrapolations/interpretations/speculations 我在这个 SO 答案中添加的。它们不在链接源中。）

5年前莫里茨expressed interest in implementing :exhaustive for matching (not parsing) features. Less than 2 minutes later jnthn showed a one liner that demo'd how he guessed he'd approach it. Less than 30 minutes later Moritz posted a working prototype. The final version landed 7 days later.

1 年前Moritz 在#perl6 上说（强调由我添加）："regexes and grammars aren't a good tool to find all possible ways to parse a string".

Hth.

Answer 2

Grammars are designed给出零个或一个答案，不能超过，所以你必须使用一些技巧让他们做你想做的事情。

由于 Grammar.parse returns 只有一个 Match 对象，您必须使用不同的方法来获取所有匹配项：

sub callback($match) {
    say $match;
}
grammar LR {
    regex TOP {
        <left> 
        <right>
        $
        { callback($/) }
        # make the match fail, thus forcing backtracking:
        <!>
    }
    regex left {
        \w ** 1..2 
    }
    regex right {
        \w ** 1..2 
    }
}

LR.parse('abc');

通过调用 <!> 断言（总是失败）使匹配失败，迫使前面的原子回溯，从而找到不同的解决方案。当然，这会降低语法的可重用性，因为它在语法的常规调用约定之外工作。

请注意，对于调用者来说，LR.parse 似乎总是失败；您将获得所有匹配项作为对回调函数的调用。

稍微好一点的 API（但下面的方法相同）是使用 gather/take 来获得所有匹配项的序列：

grammar LR {
    regex TOP {
        <left> 
        <right>
        $
        { take $/ }
        # make the match fail, thus forcing backtracking:
        <!>
    }
    regex left {
        \w ** 1..2 
    }
    regex right {
        \w ** 1..2 
    }
}

.say for gather LR.parse('abc');

如何使 Perl 6 语法产生多个匹配项（如 :ex 和 :ov）？

How to make Perl 6 grammar produce more than one match (like :ex and :ov)?

raku

备注