如何创建一个考虑“|”的抽象语法树? (层/ Yacc)
How can I create an abstract syntax tree considering '|'? (Ply / Yacc)
考虑以下语法:
expr : expr '+' term | expr '-' term | term
term : term '*' factor | term '/' factor | factor
factor : '(' expr ')' | identifier | number
这是我使用 ply 的代码:
from ply import lex, yacc
tokens = [
"identifier",
"number",
"plus",
"minus",
"mult",
"div"
]
t_ignore = r" \t"
t_identifier = r"^[a-zA-Z]+$"
t_number = r"[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?"
t_plus = r"\+"
t_minus = r"-"
t_mult = r"\*"
t_div = r"/"
def p_stmt(p):
"""stmt : expr"""
p[0] = ("stmt", p[1])
def p_expr(p):
"""expr : expr plus term
| expr minus term
| term"""
p[0] = ("expr", p[1], p[2]) # Problem here <<<
def p_term(p):
"""term : term mult factor
| term div factor
| factor"""
def p_factor(p):
"""factor : '(' expr ')'
| identifier
| number"""
if __name__ == "__main__":
lex.lex()
yacc.yacc()
data = "32 + 10"
result = yacc.parse(data)
print(result)
如果我无法访问运算符,我应该如何使用表达式构建 AST?我可以像 p_expr_plus 这样分隔函数,但在这种情况下,我会消除运算符优先级。 docs are not so helpful, since I'm a beginner and can't solve this problem. The best material I've found on the subject is this,但是没有考虑运算符优先级的复杂度。
编辑:我无法访问 p2 or p[3], since I get an IndexError (It's matching the term only). In the PDF I've linked, they explicitly put the operator inside the tuple, like: ('+', p1, p2),因此,证明了我考虑优先级的问题(我无法分离函数,表达式是表达式,应该有一种方法可以考虑管道并访问任何操作员)。
据我所知,在 p[0] = ("expr", p[1], p[2])
中,p[1]
是左手表达式,p[2]
是运算符,p[3]
(你没有使用)将是正确的术语。
只需使用 p[2]
来确定运算符,添加 p[3]
,因为您将需要它,您应该可以开始了。
此外,您必须验证 p
有多少项目,因为如果最后一条规则 | term"""
匹配,p
将只有两个项目而不是四个。
中的片段
def p_comparison(p):
"""comparison : comparison PLUS comparison
| comparison MINUS comparison
| comparison MULT comparison
| comparison DIV comparison
| comparison LT comparison
| comparison EQ comparison
| comparison GT comparison
| PLUS comparison
| MINUS comparison
| power"""
if len(p) == 4:
p[0] = binary_ops[p[2]]((p[1], p[3]))
elif len(p) == 3:
p[0] = unary_ops[p[1]](p[2])
else:
p[0] = p[1]
考虑以下语法:
expr : expr '+' term | expr '-' term | term
term : term '*' factor | term '/' factor | factor
factor : '(' expr ')' | identifier | number
这是我使用 ply 的代码:
from ply import lex, yacc
tokens = [
"identifier",
"number",
"plus",
"minus",
"mult",
"div"
]
t_ignore = r" \t"
t_identifier = r"^[a-zA-Z]+$"
t_number = r"[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?"
t_plus = r"\+"
t_minus = r"-"
t_mult = r"\*"
t_div = r"/"
def p_stmt(p):
"""stmt : expr"""
p[0] = ("stmt", p[1])
def p_expr(p):
"""expr : expr plus term
| expr minus term
| term"""
p[0] = ("expr", p[1], p[2]) # Problem here <<<
def p_term(p):
"""term : term mult factor
| term div factor
| factor"""
def p_factor(p):
"""factor : '(' expr ')'
| identifier
| number"""
if __name__ == "__main__":
lex.lex()
yacc.yacc()
data = "32 + 10"
result = yacc.parse(data)
print(result)
如果我无法访问运算符,我应该如何使用表达式构建 AST?我可以像 p_expr_plus 这样分隔函数,但在这种情况下,我会消除运算符优先级。 docs are not so helpful, since I'm a beginner and can't solve this problem. The best material I've found on the subject is this,但是没有考虑运算符优先级的复杂度。
编辑:我无法访问 p2 or p[3], since I get an IndexError (It's matching the term only). In the PDF I've linked, they explicitly put the operator inside the tuple, like: ('+', p1, p2),因此,证明了我考虑优先级的问题(我无法分离函数,表达式是表达式,应该有一种方法可以考虑管道并访问任何操作员)。
据我所知,在 p[0] = ("expr", p[1], p[2])
中,p[1]
是左手表达式,p[2]
是运算符,p[3]
(你没有使用)将是正确的术语。
只需使用 p[2]
来确定运算符,添加 p[3]
,因为您将需要它,您应该可以开始了。
此外,您必须验证 p
有多少项目,因为如果最后一条规则 | term"""
匹配,p
将只有两个项目而不是四个。
def p_comparison(p):
"""comparison : comparison PLUS comparison
| comparison MINUS comparison
| comparison MULT comparison
| comparison DIV comparison
| comparison LT comparison
| comparison EQ comparison
| comparison GT comparison
| PLUS comparison
| MINUS comparison
| power"""
if len(p) == 4:
p[0] = binary_ops[p[2]]((p[1], p[3]))
elif len(p) == 3:
p[0] = unary_ops[p[1]](p[2])
else:
p[0] = p[1]