Python 中用于 PEG 解析器的 NodeVisitor class
NodeVisitor class for PEG parser in Python
想象以下类型的字符串:
if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)
现在,我想获取括号中的表达式,所以我用以下语法编写了一个 PEG
解析器:
from parsimonious.grammar import Grammar
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (factor operator factor) / factor
factor = (lpar word operator word rpar) / (lpar expr rpar)
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
解析得很好
tree = grammar.parse(string)
现在问题来了:如何写一个NodeVisitor
class让这棵树只得到因子?我的问题是第二个分支可以深度嵌套。
我试过
def walk(node, level = 0):
if node.expr.name == "factor":
print(level * "-", node.text)
for child in node.children:
walk(child, level + 1)
walk(tree)
但实际上无济于事(因素重复出现)。
注意:此问题基于 Whosebug 上的 another one。
如果你只想return每个最外层的因素,return
早点,不要深入到它的children。
def walk(node, level = 0):
if node.expr.name == "factor":
print(level * "-", node.text)
return
for child in node.children:
walk(child, level + 1)
输出:
----- ((a1 and b) or (a2 and c))
----- (c and d)
------ (e and f)
How would I go about it to get ((a1 and b) or (a2 and c)), (c and d) and (e and f) as three parts?
当解析树中的节点是 (
时,您可以创建一个 "listens" 的访问者,其中深度变量增加,并且遇到 )
,深度变量减小。然后在调用的匹配括号表达式的方法中,检查深度,然后将其添加到访问者 return 的表达式列表中。
这是一个简单的例子:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (lpar expr rpar) / word
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
class ParExprVisitor(NodeVisitor):
def __init__(self):
self.depth = 0
self.par_expr = []
def visit_term(self, node, visited_children):
if self.depth == 0:
self.par_expr.append(node.text)
def visit_lpar(self, node, visited_children):
self.depth += 1
def visit_rpar(self, node, visited_children):
self.depth -= 1
def generic_visit(self, node, visited_children):
return self.par_expr
tree = grammar.parse("if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)")
visitor = ParExprVisitor()
for expr in visitor.visit(tree):
print(expr)
打印:
((a1 and b) or (a2 and c))
(c and d)
(e and f)
想象以下类型的字符串:
if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)
现在,我想获取括号中的表达式,所以我用以下语法编写了一个 PEG
解析器:
from parsimonious.grammar import Grammar
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (factor operator factor) / factor
factor = (lpar word operator word rpar) / (lpar expr rpar)
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
解析得很好
tree = grammar.parse(string)
现在问题来了:如何写一个NodeVisitor
class让这棵树只得到因子?我的问题是第二个分支可以深度嵌套。
我试过
def walk(node, level = 0):
if node.expr.name == "factor":
print(level * "-", node.text)
for child in node.children:
walk(child, level + 1)
walk(tree)
但实际上无济于事(因素重复出现)。
注意:此问题基于 Whosebug 上的 another one。
如果你只想return每个最外层的因素,return
早点,不要深入到它的children。
def walk(node, level = 0):
if node.expr.name == "factor":
print(level * "-", node.text)
return
for child in node.children:
walk(child, level + 1)
输出:
----- ((a1 and b) or (a2 and c))
----- (c and d)
------ (e and f)
How would I go about it to get ((a1 and b) or (a2 and c)), (c and d) and (e and f) as three parts?
当解析树中的节点是 (
时,您可以创建一个 "listens" 的访问者,其中深度变量增加,并且遇到 )
,深度变量减小。然后在调用的匹配括号表达式的方法中,检查深度,然后将其添加到访问者 return 的表达式列表中。
这是一个简单的例子:
from parsimonious.grammar import Grammar
from parsimonious.nodes import NodeVisitor
grammar = Grammar(
r"""
program = if expr+
expr = term (operator term)*
term = (lpar expr rpar) / word
if = "if" ws
and = "and"
or = "or"
operator = ws? (and / or) ws?
word = ~"\w+"
lpar = "("
rpar = ")"
ws = ~"\s*"
""")
class ParExprVisitor(NodeVisitor):
def __init__(self):
self.depth = 0
self.par_expr = []
def visit_term(self, node, visited_children):
if self.depth == 0:
self.par_expr.append(node.text)
def visit_lpar(self, node, visited_children):
self.depth += 1
def visit_rpar(self, node, visited_children):
self.depth -= 1
def generic_visit(self, node, visited_children):
return self.par_expr
tree = grammar.parse("if ((a1 and b) or (a2 and c)) or (c and d) or (e and f)")
visitor = ParExprVisitor()
for expr in visitor.visit(tree):
print(expr)
打印:
((a1 and b) or (a2 and c))
(c and d)
(e and f)