在 pyparsing 中处理 ZeroOrMore

Dealing with ZeroOrMore in pyparsing

我正在尝试使用 pyparsing 解析 pactl 列表:到目前为止,所有解析工作正常,但我无法使 ZeroOrMore 正常工作。

我可以找到 foo:foo: bar 并尝试用 ZeroOrMore 处理它,但它不起作用,我必须添加特殊情况 "Argument:"找到没有价值的结果,但是有 Argument: foo 个结果(有价值)所以它不会工作,我希望任何其他 属性 没有价值存在。

有了这个定义,一个固定的 pactl 列表输出:

#!/usr/bin/env python

#
# parsing pactl list
#

from pyparsing import *
import os
from subprocess import check_output
import sys

data = '''
Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
'''

indentStack = [1]
stmt = Forward()

identifier = Word(alphanums+"-_.")

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1)))))
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)

stmt << ( section | prop | ("Argument:") | value | prop_val )

syntax = OneOrMore(stmt)

parseTree = syntax.parseString(data)
parseTree.pprint()

这得到:

$ ./pactl.py

Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
[[['Module'], ['6']],
 [['Argument:'],
  [[['Name'], ['module-alsa-card']]],
  [[['Usage counter'], ['0']]],
  ['Properties:',
   [[[['module.author'], ['"Lennart Poettering"']]],
    [[['module.description'], ['"ALSA Card"']]],
    [[['module.version'], ['"14.0-rebootstrapped"']]]]]]]

到目前为止一切顺利,但是删除 Argument: 的特殊情况会出错,因为 ZeroOrMore 的行为不符合预期:

#!/usr/bin/env python

#
# parsing pactl list
#

from pyparsing import *
import os
from subprocess import check_output
import sys

data = '''
Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
'''

indentStack = [1]
stmt = Forward()

identifier = Word(alphanums+"-_.")

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums)))
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop = (prop_name + prop_section)

stmt << ( section | prop | value | prop_val )


syntax = OneOrMore(stmt)

parseTree = syntax.parseString(data)
parseTree.pprint()

这导致:

$ ./pactl.py

Module #6
    Argument:
    Name: module-alsa-card
    Usage counter: 0
    Properties:
        module.author = "Lennart Poettering"
        module.description = "ALSA Card"
        module.version = "14.0-rebootstrapped"
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 19(3,9)
Matched Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) -> [[['Argument'], ['Name']]]
Match Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) at loc 1(2,1)
Exception raised:Expected ":", found '#'  (at char 8), (line:2, col:8)
Traceback (most recent call last):
  File "/home/alberto/projects/node/pacmd_list_json/./pactl.py", line 55, in <module>
    parseTree = syntax.parseString(partial)
  File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 1955, in parseString
    raise exc
  File "/usr/local/lib/python3.9/site-packages/pyparsing.py", line 6336, in checkUnindent
    raise ParseException(s, l, "not an unindent")
pyparsing.ParseException: Expected {{Group:({Group:(W:(ABCD...)) Suppress:("#") Group:(W:(0123...))}) indented block} | {"Properties:" indented block} | Group:({Group:(Combine:({{W:(ABCD...) | <SP>}}...)) Suppress:(":") Group:(Combine:([{W:(ABCD...) | <SP>}]...))}) | Group:({Group:(W:(ABCD...)) Suppress:("=") Group:(Combine:({{W:(ABCD...) | <SP><TAB>}}...))})}, found ':'  (at char 41), (line:4, col:13)

参见 setDebug value 语法 ZeroOrMore 正在从下一行获取标记 [[['Argument'], ['Name']]]

我尝试了 LineEnd() 和其他技巧,但 none 有效。

关于如何处理 ZeroOrMoreLineEnd() 停止或没有特殊情况的任何想法?

注意:可以使用以下方法检索实际输出:

env = os.environ.copy()
env['LANG'] = 'C'
data = check_output(
    ['pactl', 'list'], universal_newlines=True, env=env)

indentedBlock 不是最容易使用的 pyparsing 元素。但是您正在做的一些事情妨碍了您。

为了调试这个,我分解了一些更复杂的表达式,使用 setName() 给它们命名,然后添加了 .setDebug()。像这样:

identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()

这将告诉 pyparsing 在匹配此表达式时输出一条消息,如果匹配成功,或者如果不匹配,则引发异常。

Match identifier at loc 1(2,1)
Matched identifier -> ['Module']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 15(3,5)
Matched identifier -> ['Argument']
Match identifier at loc 23(3,13)
Exception raised:Expected identifier, found ':'  (at char 23), (line:3, col:13)

看起来这些表达式搞乱了 indentedBlock 匹配,通过处理 whitespace 应该是缩进 space:

Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))

Word 中的 " 字符和白色 space 让我相信您正在尝试匹配带引号的字符串。我将此表达式替换为:

Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString))

您还需要注意不要读到行尾之后,否则您还会弄乱 indentedBlock 缩进跟踪。我在顶部为换行符添加了这个表达式:

NL = LineEnd()

然后将其用作 OneOrMoreZeroOrMore:

stopOn 参数
prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=")  + Group(prop_val_value)).setName("prop_val")#.setDebug()

这是我最终得到的解析器:

indentStack = [1]
stmt = Forward()
NL = LineEnd()

identifier = Word(alphas, alphanums+"-_.").setName("identifier").setDebug()

sect_def = Group(Group(identifier) + Suppress("#") + Group(Word(nums))).setName("sect_def")#.setDebug()
inner_section = indentedBlock(stmt, indentStack)
section = (sect_def + inner_section)

#~ value = Group(Group(Combine(OneOrMore(identifier|White(' ')))) + Suppress(":") + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_".')|White(' ', max=1))))).setDebug()
value_label = originalTextFor(OneOrMore(identifier)).setName("value_label")#.setDebug()
value = Group(value_label
              + Suppress(":")
              + Optional(~NL + Group(Combine(ZeroOrMore(Word(alphanums+'-/=_.') | quotedString(), stopOn=NL))))).setName("value")#.setDebug()
prop_name = Literal("Properties:")
prop_section = indentedBlock(stmt, indentStack)
#~ prop_val = Group(Group(identifier) + Suppress("=")  + Group(Combine(OneOrMore(Word(alphanums+'-"/.')|White(' \t')))))
prop_val_value = Combine(OneOrMore(Word(alphas, alphanums+'-/.') | quotedString(), stopOn=NL)).setName("prop_val_value")#.setDebug()
prop_val = Group(identifier + Suppress("=") + Group(prop_val_value)).setName("prop_val")#.setDebug()
prop = (prop_name + prop_section).setName("prop")#.setDebug()

stmt << ( section | prop | value | prop_val )

这给出了这个:

[[['Module'], ['6']],
 [[['Argument']],
  [['Name', ['module-alsa-card']]],
  [['Usage counter', ['0']]],
  ['Properties:',
   [[['module.author', ['"Lennart Poettering"']]],
    [['module.description', ['"ALSA Card"']]],
    [['module.version', ['"14.0-rebootstrapped"']]]]]]]