perl6 如何只在某些条件下匹配一个字符?
perl6 How to match a character only under some conditions?
我有一个格式如下的文件:
- foo bar - baz
one two three - or four
and another line
- next job
do this - and that
我的语法是
grammar tasks {
regex TOP { \n* <oneTask>+ \n* }
regex oneTask { ^^ \- (<oneSection> <endSection>)+ }
regex oneSection { \N+ } # this is not quite working
regex endSection { \n+ }
}
在正则表达式 oneSection 中,如何编码 "I want to match a '-' only when it is not at the beginning of a line"?
我将文件放入一个字符串并解析这个字符串:
my $content = slurp("taskFile");
my $result = tasks.parse($content);
这不是很有效。
<[\N] - [\-]> does not make the match conditional.
谢谢!!
匹配不是字符串开头的任何内容,后跟破折号
[^$]-
放下你想要匹配的东西比试图排除一些东西更容易。
您要查找的是一行开头的一个字符,该字符不是换行符或破折号,后跟任意数量的非换行符。或者您正在寻找至少一个不是换行符且不在换行符之后的字符。
regex oneSection {
|| ^^ # beginning of line
<-[\n-]> # not newline or dash
\N* # any number of not newlines
|| <!before ^^> # check the position before this is not the start of a line
\N+
}
(这太复杂了,因为你试图把复杂性放在语法中的错误位置)
您也可以像现在这样匹配并添加一个以 -
.
开头的测试失败
regex oneSection {
\N+
<!{ # fail if the following is True
$/.starts-with('-')
}>
}
Grammar 是 Class 的一种,Regex/Token/Rule 是 Method 的一种。因此,您可能应该通过添加换行符和注释以这种方式编写它们。
如果您学习如何使用 %
和 %%
正则表达式运算符,编写语法会变得更好。
(不同之处在于 %%
可以匹配尾随分隔符)
有效地使用 %
需要一些时间来适应,所以我将向您展示我将如何使用它来匹配您的文件。
我还将各部分的分隔符从一个换行符更改为一个换行符和两个空格。这将从 section
匹配的内容中删除空格,这将简化任何进一步的处理。
在您学习的同时,我建议您使用 Grammar::Debugger 和 Grammar::Tracer。
grammar Tasks {
# use token for its :ratchet behaviour
# ( more performant than regex because it doesn't backtrack )
token TOP {
\n* # ignore any preceding empty lines
<task>+ # at least one task
% # separated by
\n+ # at least one newline
\n* # ignore trailing empty lines
}
token task {
^^ '- ' # a task starts with 「- 」 at the beginning of a line
<section>+ # has at least one section
% # separated by
"\n " # a newline and two spaces
}
token section { \N+ }
}
my $test = q:to/END/;
- foo bar - baz
one two three - or four
and another line
- next job
do this - and that
END
put Tasks.parse( $test, :actions(class {
method TOP ($/) { make @<task>».made.List }
method task ($/) { make @<section>».made.List }
method section ($/) {
make ~$/ # don't do any processing, just make it a Str
}
})).made.perl;
# (("foo bar - baz", "one two three - or four", "and another line"),
# ("next job", "do this - and that"))
如果我把 use Grammar::Tracer;
放在顶部,这就是它输出的内容:
TOP
| task
| | section
| | * MATCH "foo bar - baz"
| | section
| | * MATCH "one two three - or four"
| | section
| | * MATCH "and another line"
| * MATCH "- foo bar - baz\n one two three - or four\n and another l"
| task
| | section
| | * MATCH "next job"
| | section
| | * MATCH "do this - and that"
| * MATCH "- next job\n do this - and that"
| task
| * FAIL
* MATCH "- foo bar - baz\n one two three - or four\n and another line"
FAIL
是预期的,因为有一个尾随换行符,并且就语法所知,它后面可以跟一个任务。
我有一个格式如下的文件:
- foo bar - baz
one two three - or four
and another line
- next job
do this - and that
我的语法是
grammar tasks {
regex TOP { \n* <oneTask>+ \n* }
regex oneTask { ^^ \- (<oneSection> <endSection>)+ }
regex oneSection { \N+ } # this is not quite working
regex endSection { \n+ }
}
在正则表达式 oneSection 中,如何编码 "I want to match a '-' only when it is not at the beginning of a line"?
我将文件放入一个字符串并解析这个字符串:
my $content = slurp("taskFile");
my $result = tasks.parse($content);
这不是很有效。
<[\N] - [\-]> does not make the match conditional.
谢谢!!
匹配不是字符串开头的任何内容,后跟破折号
[^$]-
放下你想要匹配的东西比试图排除一些东西更容易。
您要查找的是一行开头的一个字符,该字符不是换行符或破折号,后跟任意数量的非换行符。或者您正在寻找至少一个不是换行符且不在换行符之后的字符。
regex oneSection {
|| ^^ # beginning of line
<-[\n-]> # not newline or dash
\N* # any number of not newlines
|| <!before ^^> # check the position before this is not the start of a line
\N+
}
(这太复杂了,因为你试图把复杂性放在语法中的错误位置)
您也可以像现在这样匹配并添加一个以 -
.
regex oneSection {
\N+
<!{ # fail if the following is True
$/.starts-with('-')
}>
}
Grammar 是 Class 的一种,Regex/Token/Rule 是 Method 的一种。因此,您可能应该通过添加换行符和注释以这种方式编写它们。
如果您学习如何使用 %
和 %%
正则表达式运算符,编写语法会变得更好。
(不同之处在于 %%
可以匹配尾随分隔符)
有效地使用 %
需要一些时间来适应,所以我将向您展示我将如何使用它来匹配您的文件。
我还将各部分的分隔符从一个换行符更改为一个换行符和两个空格。这将从 section
匹配的内容中删除空格,这将简化任何进一步的处理。
在您学习的同时,我建议您使用 Grammar::Debugger 和 Grammar::Tracer。
grammar Tasks {
# use token for its :ratchet behaviour
# ( more performant than regex because it doesn't backtrack )
token TOP {
\n* # ignore any preceding empty lines
<task>+ # at least one task
% # separated by
\n+ # at least one newline
\n* # ignore trailing empty lines
}
token task {
^^ '- ' # a task starts with 「- 」 at the beginning of a line
<section>+ # has at least one section
% # separated by
"\n " # a newline and two spaces
}
token section { \N+ }
}
my $test = q:to/END/;
- foo bar - baz
one two three - or four
and another line
- next job
do this - and that
END
put Tasks.parse( $test, :actions(class {
method TOP ($/) { make @<task>».made.List }
method task ($/) { make @<section>».made.List }
method section ($/) {
make ~$/ # don't do any processing, just make it a Str
}
})).made.perl;
# (("foo bar - baz", "one two three - or four", "and another line"),
# ("next job", "do this - and that"))
如果我把 use Grammar::Tracer;
放在顶部,这就是它输出的内容:
TOP
| task
| | section
| | * MATCH "foo bar - baz"
| | section
| | * MATCH "one two three - or four"
| | section
| | * MATCH "and another line"
| * MATCH "- foo bar - baz\n one two three - or four\n and another l"
| task
| | section
| | * MATCH "next job"
| | section
| | * MATCH "do this - and that"
| * MATCH "- next job\n do this - and that"
| task
| * FAIL
* MATCH "- foo bar - baz\n one two three - or four\n and another line"
FAIL
是预期的,因为有一个尾随换行符,并且就语法所知,它后面可以跟一个任务。