pyparsing/nestedExpr - 如何匹配一些用于嵌套的花括号,它们也是文字的一部分?

pyparsing/nestedExpr - how to match some curly brackets used for nesting which are also part of literals?

我正在尝试解析格式如下的文件,有时我需要按原样提取的文字中使用了相同的 nestedExpr 字符。

输入:

      {
# some comment
        location 1 {
            command 1
        }
# this item is commented out
#        location 2 {
#            command 2
#        }
        location 3 {
            command 3 /tmp; PATH=/usr/bin:${PATH} ./abc.bat"
        }
        location 4 {
            command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
        }
    }

实际输出:

[['# some comment',
  'location 1 ',
  ['command 1'],
  '# this item is commented out',
  '#        location 2 ',
  ['#            command 2', '#        '],
  'location 3 ',
  ['command 3 /tmp; "PATH=/usr/bin:$', ['PATH'], './abc.bat"'],
  'location 4 ',
  ['command 4 -c "PATH=/usr/local/bin:$', ['PATH'], 'ls -l"']]]

预期输出:

[['# some comment',
  'location 1 ',
  ['command 1'],
  '# this item is commented out',
  '#        location 2 ',
  ['#            command 2', '#        '],
  'location 3 ',
  ['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
  'location 4 ',
  ['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]

正如所见,我希望我的脚本按原样 return "${PATH}" 而不将其解析为数组。

下面你可以看到我试过的代码,非常感谢任何帮助。

from pyparsing import nestedExpr, Combine, Literal, OneOrMore, CharsNotIn
from pprint import pprint
content =  Combine(OneOrMore(~Literal("{")
                                  + ~Literal("}") 
                                  + CharsNotIn('\n',exact=1)))
parser = nestedExpr(opener='{', closer='}', content=content)


inputStr = '''      {
# some comment
        location 1 {
            command 1
        }
# this item is commented out
#        location 2 {
#            command 2
#        }
        location 3 {
            command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"
        }
        location 4 {
            command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
        }
    }'''


output = parser.parseString(inputStr, parseAll=True).asList()

pprint(output)

您必须扩展 content 的定义,以便在解析 CharsNotIn 术语之前明确检测这些 "${...}" 元素。

dollar = Literal("$")
substitution_expr = Combine(dollar + "{" + ... + "}")
content = Combine(OneOrMore(~Literal("{")
                            + ~Literal("}")
                            + (substitution_expr | CharsNotIn('\n', exact=1))
                            )
                  )

# content could be simplified to just this
content = Combine(OneOrMore(substitution_expr 
                            | CharsNotIn('{}\n', exact=1))
                  )

进行此更改我得到:

[['# some comment',
  'location 1 ',
  ['command 1'],
  '# this item is commented out',
  '#        location 2 ',
  ['#            command 2', '#        '],
  'location 3 ',
  ['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
  'location 4 ',
  ['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]