使用 pyparsing 设置 `delimitedList` 的最大出现次数
Setting the maximum number occurrences with `delimitedList` using pyparsing
pyparsing provides a helper function, delimitedList,匹配一个或多个表达式的序列,用定界符分隔:
delimitedList(expr, delim=',', combine=False)
这如何用于匹配表达式序列,其中每个表达式可能出现 零次或一次 次?
例如,为了匹配 "foo", "bar, "baz"
,我采用自下而上的方法为每个单词创建了一个标记:
import pyparsing as pp
dbl_quote = pp.Suppress('"')
foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote
我想创建一个匹配的表达式:
zero or one occurrences of "foo"
,
zero or one occurrences of "bar"
,
zero or one occurrences of "baz"
... 任意顺序。有效输入示例:
"foo", "bar", "baz"
"baz", "bar", "foo",
// 顺序不重要
"bar", "baz"
// 允许出现零次
"baz"
// 所有标记的出现次数为零
无效输入示例:
"notfoo", "notbar", "notbaz"
"foo", "foo", "bar", "baz"
// foo
出现两次
"foo" "bar", "baz"
// 缺少逗号
"foo" "bar", "baz",
// 尾随逗号
我倾向于 delimitedList 因为我的输入是一个逗号分隔的列表,但现在我觉得这个功能对我不利而不是对我有利。
import pyparsing as pp
dbl_quote = pp.Suppress('"')
foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote
# This is NOT what I want because it allows tokens
# to occur more than once.
foobarbaz = pp.delimitedList(foo | bar | baz)
if __name__ == "__main__":
TEST = '"foo", "bar", "baz"'
results = foobarbaz.parseString(TEST)
results.pprint()
通常,当我看到 "in any order" 作为语法的一部分时,我的第一个想法是使用 Each
,您可以使用 &
运算符创建它:
undelimited_foo_bar_baz = foo & bar & baz
此解析器将以任何顺序解析 foo
、bar
和 baz
。如果您希望它们是可选的,那么只需将它们包装在 Optional:
undelimited_foo_bar_baz = Optional(foo) & Optional(bar) & Optional(baz)
但是输入中的中间逗号会造成这种混乱,因此作为回退,您可以坚持使用 delimitedList
(这将去除逗号)添加条件解析操作以获取 运行 解析列表后,验证每个匹配项中是否只存在一个:
from collections import Counter
def no_more_than_one_of_any(t):
return all(freq == 1 for freq in Counter(t.asList()).values())
foobarbaz.addCondition(no_more_than_one_of_any, message="duplicate item found in list")
if __name__ == "__main__":
tests = '''\
"foo"
"bar"
"baz"
"foo", "baz"
"foo", "bar", "baz"
"foo", "bar", "baz", "foo"
'''
foobarbaz.runTests(tests)
打印:
"foo"
['foo']
"bar"
['bar']
"baz"
['baz']
"foo", "baz"
['foo', 'baz']
"foo", "bar", "baz"
['foo', 'bar', 'baz']
"foo", "bar", "baz", "foo"
^
FAIL: duplicate item found in list, found '"' (at char 0), (line:1, col:1)
pyparsing provides a helper function, delimitedList,匹配一个或多个表达式的序列,用定界符分隔:
delimitedList(expr, delim=',', combine=False)
这如何用于匹配表达式序列,其中每个表达式可能出现 零次或一次 次?
例如,为了匹配 "foo", "bar, "baz"
,我采用自下而上的方法为每个单词创建了一个标记:
import pyparsing as pp
dbl_quote = pp.Suppress('"')
foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote
我想创建一个匹配的表达式:
zero or one occurrences of
"foo"
, zero or one occurrences of"bar"
, zero or one occurrences of"baz"
... 任意顺序。有效输入示例:
"foo", "bar", "baz"
"baz", "bar", "foo",
// 顺序不重要"bar", "baz"
// 允许出现零次"baz"
// 所有标记的出现次数为零
无效输入示例:
"notfoo", "notbar", "notbaz"
"foo", "foo", "bar", "baz"
//foo
出现两次
"foo" "bar", "baz"
// 缺少逗号"foo" "bar", "baz",
// 尾随逗号
我倾向于 delimitedList 因为我的输入是一个逗号分隔的列表,但现在我觉得这个功能对我不利而不是对我有利。
import pyparsing as pp
dbl_quote = pp.Suppress('"')
foo = dbl_quote + pp.Literal('foo') + dbl_quote
bar = dbl_quote + pp.Literal('bar') + dbl_quote
baz = dbl_quote + pp.Literal('baz') + dbl_quote
# This is NOT what I want because it allows tokens
# to occur more than once.
foobarbaz = pp.delimitedList(foo | bar | baz)
if __name__ == "__main__":
TEST = '"foo", "bar", "baz"'
results = foobarbaz.parseString(TEST)
results.pprint()
通常,当我看到 "in any order" 作为语法的一部分时,我的第一个想法是使用 Each
,您可以使用 &
运算符创建它:
undelimited_foo_bar_baz = foo & bar & baz
此解析器将以任何顺序解析 foo
、bar
和 baz
。如果您希望它们是可选的,那么只需将它们包装在 Optional:
undelimited_foo_bar_baz = Optional(foo) & Optional(bar) & Optional(baz)
但是输入中的中间逗号会造成这种混乱,因此作为回退,您可以坚持使用 delimitedList
(这将去除逗号)添加条件解析操作以获取 运行 解析列表后,验证每个匹配项中是否只存在一个:
from collections import Counter
def no_more_than_one_of_any(t):
return all(freq == 1 for freq in Counter(t.asList()).values())
foobarbaz.addCondition(no_more_than_one_of_any, message="duplicate item found in list")
if __name__ == "__main__":
tests = '''\
"foo"
"bar"
"baz"
"foo", "baz"
"foo", "bar", "baz"
"foo", "bar", "baz", "foo"
'''
foobarbaz.runTests(tests)
打印:
"foo"
['foo']
"bar"
['bar']
"baz"
['baz']
"foo", "baz"
['foo', 'baz']
"foo", "bar", "baz"
['foo', 'bar', 'baz']
"foo", "bar", "baz", "foo"
^
FAIL: duplicate item found in list, found '"' (at char 0), (line:1, col:1)