pyparsing/nestedExpr - 如何匹配一些用于嵌套的花括号,它们也是文字的一部分?
pyparsing/nestedExpr - how to match some curly brackets used for nesting which are also part of literals?
我正在尝试解析格式如下的文件,有时我需要按原样提取的文字中使用了相同的 nestedExpr 字符。
输入:
{
# some comment
location 1 {
command 1
}
# this item is commented out
# location 2 {
# command 2
# }
location 3 {
command 3 /tmp; PATH=/usr/bin:${PATH} ./abc.bat"
}
location 4 {
command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
}
}
实际输出:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:$', ['PATH'], './abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:$', ['PATH'], 'ls -l"']]]
预期输出:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]
正如所见,我希望我的脚本按原样 return "${PATH}" 而不将其解析为数组。
下面你可以看到我试过的代码,非常感谢任何帮助。
from pyparsing import nestedExpr, Combine, Literal, OneOrMore, CharsNotIn
from pprint import pprint
content = Combine(OneOrMore(~Literal("{")
+ ~Literal("}")
+ CharsNotIn('\n',exact=1)))
parser = nestedExpr(opener='{', closer='}', content=content)
inputStr = ''' {
# some comment
location 1 {
command 1
}
# this item is commented out
# location 2 {
# command 2
# }
location 3 {
command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"
}
location 4 {
command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
}
}'''
output = parser.parseString(inputStr, parseAll=True).asList()
pprint(output)
您必须扩展 content
的定义,以便在解析 CharsNotIn
术语之前明确检测这些 "${...}"
元素。
dollar = Literal("$")
substitution_expr = Combine(dollar + "{" + ... + "}")
content = Combine(OneOrMore(~Literal("{")
+ ~Literal("}")
+ (substitution_expr | CharsNotIn('\n', exact=1))
)
)
# content could be simplified to just this
content = Combine(OneOrMore(substitution_expr
| CharsNotIn('{}\n', exact=1))
)
进行此更改我得到:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]
我正在尝试解析格式如下的文件,有时我需要按原样提取的文字中使用了相同的 nestedExpr 字符。
输入:
{
# some comment
location 1 {
command 1
}
# this item is commented out
# location 2 {
# command 2
# }
location 3 {
command 3 /tmp; PATH=/usr/bin:${PATH} ./abc.bat"
}
location 4 {
command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
}
}
实际输出:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:$', ['PATH'], './abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:$', ['PATH'], 'ls -l"']]]
预期输出:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]
正如所见,我希望我的脚本按原样 return "${PATH}" 而不将其解析为数组。
下面你可以看到我试过的代码,非常感谢任何帮助。
from pyparsing import nestedExpr, Combine, Literal, OneOrMore, CharsNotIn
from pprint import pprint
content = Combine(OneOrMore(~Literal("{")
+ ~Literal("}")
+ CharsNotIn('\n',exact=1)))
parser = nestedExpr(opener='{', closer='}', content=content)
inputStr = ''' {
# some comment
location 1 {
command 1
}
# this item is commented out
# location 2 {
# command 2
# }
location 3 {
command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"
}
location 4 {
command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"
}
}'''
output = parser.parseString(inputStr, parseAll=True).asList()
pprint(output)
您必须扩展 content
的定义,以便在解析 CharsNotIn
术语之前明确检测这些 "${...}"
元素。
dollar = Literal("$")
substitution_expr = Combine(dollar + "{" + ... + "}")
content = Combine(OneOrMore(~Literal("{")
+ ~Literal("}")
+ (substitution_expr | CharsNotIn('\n', exact=1))
)
)
# content could be simplified to just this
content = Combine(OneOrMore(substitution_expr
| CharsNotIn('{}\n', exact=1))
)
进行此更改我得到:
[['# some comment',
'location 1 ',
['command 1'],
'# this item is commented out',
'# location 2 ',
['# command 2', '# '],
'location 3 ',
['command 3 /tmp; "PATH=/usr/bin:${PATH} ./abc.bat"'],
'location 4 ',
['command 4 -c "PATH=/usr/local/bin:${PATH} ls -l"']]]