Python PLY 问题与 if-else 和 while 语句

Python PLY issue with if-else and while statements

if 语句和 while 语句不断从 p_error(p) 中抛出语法错误,PLY 告诉我在运行时存在冲突。这些问题来自 if-else 和 while 语句,因为在添加它们之前没问题。任何帮助将不胜感激。

如果可能,请不要对实现进行太多更改,即使它的做法不佳。我只是想帮助理解它我不想彻底检修(那是剽窃)。

import ply.lex as lex
import ply.yacc as yacc

# === Lexical tokens component ===

# List of possible token namesthat can be produced by the lexer
# NAME: variable name, L/RPAREN: Left/Right Parenthesis
tokens = (
    'NAME', 'NUMBER',
    'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'MODULO', 'EQUALS',
    'LPAREN', 'RPAREN',
    'IF', 'ELSE', 'WHILE',
    'EQUAL', 'NOTEQ', 'LARGE', 'SMALL', 'LRGEQ', 'SMLEQ',
)

# Regular expression rules for tokens format: t_<TOKEN>
# Simple tokens: regex for literals +,-,*,/,%,=,(,) and variable names (alphanumeric)
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_MODULO  = r'%'
t_EQUALS  = r'='
t_LPAREN  = r'\('
t_RPAREN  = r'\)'
t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'
t_IF      = r'if'
t_ELSE    = r'else'
t_WHILE   = r'while'
t_EQUAL   = r'\=\='
t_NOTEQ   = r'\!\='
t_LARGE   = r'\>'
t_SMALL   = r'\<'
t_LRGEQ   = r'\>\='
t_SMLEQ   = r'\<\='


# complex tokens
# number token
def t_NUMBER(t):
    r'\d+'  # digit special character regex
    t.value = int(t.value)  # convert str -> int
    return t


# Ignored characters
t_ignore = " \t"  # spaces & tabs regex

# newline character
def t_newline(t):
    r'\n+'  # newline special character regex
    t.lexer.lineno += t.value.count("\n")  # increase current line number accordingly


# error handling for invalid character
def t_error(t):
    print("Illegal character '%s'" % t.value[0])  # print error message with causing character
    t.lexer.skip(1)  # skip invalid character


# Build the lexer
lex.lex()

# === Yacc parsing/grammar component ===

# Precedence & associative rules for the arithmetic operators
# 1. Unary, right-associative minus.
# 2. Binary, left-associative multiplication, division, and modulus
# 3. Binary, left-associative addition and subtraction
# Parenthesis precedence defined through the grammar
precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE', 'MODULO'),
    ('right', 'UMINUS'),
)

# dictionary of names (for storing variables)
names = {}

# --- Grammar:
# <statement> -> NAME = <expression> | <expression>
# <expression> -> <expression> + <expression>
#               | <expression> - <expression>
#               | <expression> * <expression>
#               | <expression> / <expression>
#               | <expression> % <expression>
#               | - <expression>
#               | ( <expression> )
#               | NUMBER
#               | NAME
# ---
# defined below using function definitions with format string/comment
# followed by logic of changing state of engine


# if statement
def p_statement_if(p):
    '''statement : IF LPAREN comparison RPAREN statement
                    | IF LPAREN comparison RPAREN statement ELSE statement'''
    if p[3]:
        p[0] = p[5]
    else:
        if p[7] is not None:
            p[0] = p[7]


def p_statement_while(p):
    'statement : WHILE LPAREN comparison RPAREN statement'
    while(p[3]):
        p[5];


# assignment statement: <statement> -> NAME = <expression>
def p_statement_assign(p):
    'statement : NAME EQUALS expression'
    names[p[1]] = p[3]  # PLY engine syntax, p stores parser engine state


# expression statement: <statement> -> <expression>
def p_statement_expr(p):
    'statement : expression'
    print(p[1])


# comparison
def p_comparison_binop(p):
    '''comparison : expression EQUAL expression
                          | expression NOTEQ expression
                          | expression LARGE expression
                          | expression SMALL expression
                          | expression LRGEQ expression
                          | expression SMLEQ expression'''
    if p[2] == '==':
        p[0] = p[1] == p[3]
    elif p[2] == '!=':
        p[0] = p[1] != p[3]
    elif p[2] == '>':
        p[0] = p[1] > p[3]
    elif p[2] == '<':
        p[0] = p[1] < p[3]
    elif p[2] == '>=':
        p[0] = p[1] >= p[3]
    elif p[2] == '<=':
        p[0] = p[1] <= p[3]


# binary operator expression: <expression> -> <expression> + <expression>
#                                          | <expression> - <expression>
#                                          | <expression> * <expression>
#                                          | <expression> / <expression>
#                                          | <expression> % <expression>
def p_expression_binop(p):
    '''expression : expression PLUS expression
                          | expression MINUS expression
                          | expression TIMES expression
                          | expression DIVIDE expression
                          | expression MODULO expression'''
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]
    elif p[2] == '*':
        p[0] = p[1] * p[3]
    elif p[2] == '/':
        p[0] = p[1] / p[3]
    elif p[2] == '%':
        p[0] = p[1] % p[3]


# unary minus operator expression: <expression> -> - <expression>
def p_expression_uminus(p):
    'expression : MINUS expression %prec UMINUS'
    p[0] = -p[2]


# parenthesis group expression: <expression> -> ( <expression> )
def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]


# number literal expression: <expression> -> NUMBER
def p_expression_number(p):
    'expression : NUMBER'
    p[0] = p[1]


# variable name literal expression: <expression> -> NAME
def p_expression_name(p):
    'expression : NAME'
    # attempt to lookup variable in current dictionary, throw error if not found
    try:
        p[0] = names[p[1]]
    except LookupError:
        print("Undefined name '%s'" % p[1])
        p[0] = 0


# handle parsing errors
def p_error(p):
    print("Syntax error at '%s'" % p.value)


# build parser
yacc.yacc()

# start interpreter and accept input using commandline/console
while True:
    try:
        s = input('calc > ')  # get user input. use raw_input() on Python 2
    except EOFError:
        break
    yacc.parse(s)  # parse user input string

您的基本问题是您的词法分析器无法识别关键字 ifwhile(也不识别 else),因为在这些情况下会触发 t_NAME 模式. section 4.3 of the Ply documentation 中描述了问题和可能的解决方案。问题是:

Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

并且 t_NAME 的表达式比简单的关键字模式更长。

您不能通过将 t_NAME 变成一个词法分析器函数来解决这个问题,因为函数定义的标记在字符串定义的标记之前被检查。

但是你可以把t_NAME做成一个函数,在函数中查字典匹配到的字符串是不是保留字。 (请参阅链接部分末尾的示例,在 "To handle reserved words..." 开头的段落中)。当你这样做时,你根本没有定义 t_IFt_WHILEt_ELSE


shift-reduce 冲突是 "dangling else" 的问题。如果您搜索该短语,您会找到各种解决方案。

最简单的解决方案是什么也不做,只是忽略警告,因为默认情况下 Ply 会做正确的事情。

第二个最简单的解决方案是将 ('if', 'IF'), ('left', 'ELSE') 添加到优先级列表,并向 if 产生式添加优先级标记:

'''statement : IF LPAREN comparison RPAREN statement %prec IF
             | IF LPAREN comparison RPAREN statement ELSE statement'''

赋予 ELSEIF 更高的优先级值可确保当解析器需要在第二个产生式中移动 ELSE 或在第一个产生式中减少时,它选择移位(因为 ELSE 具有更高的优先级)。事实上,这是默认行为,所以优先级声明根本不会影响解析行为;但是,它会抑制 shift-reduce 冲突警告,因为冲突已解决。

另一种解决方案,请参阅


最后,请查看对您的问题的评论。您对 ifwhile 语句的操作根本不起作用。