Python PLY 问题与 if-else 和 while 语句

Question

if 语句和 while 语句不断从 p_error(p) 中抛出语法错误，PLY 告诉我在运行时存在冲突。这些问题来自 if-else 和 while 语句，因为在添加它们之前没问题。任何帮助将不胜感激。

如果可能，请不要对实现进行太多更改，即使它的做法不佳。我只是想帮助理解它我不想彻底检修（那是剽窃）。

import ply.lex as lex
import ply.yacc as yacc

# === Lexical tokens component ===

# List of possible token namesthat can be produced by the lexer
# NAME: variable name, L/RPAREN: Left/Right Parenthesis
tokens = (
    'NAME', 'NUMBER',
    'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'MODULO', 'EQUALS',
    'LPAREN', 'RPAREN',
    'IF', 'ELSE', 'WHILE',
    'EQUAL', 'NOTEQ', 'LARGE', 'SMALL', 'LRGEQ', 'SMLEQ',
)

# Regular expression rules for tokens format: t_<TOKEN>
# Simple tokens: regex for literals +,-,*,/,%,=,(,) and variable names (alphanumeric)
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_MODULO  = r'%'
t_EQUALS  = r'='
t_LPAREN  = r'\('
t_RPAREN  = r'\)'
t_NAME    = r'[a-zA-Z_][a-zA-Z0-9_]*'
t_IF      = r'if'
t_ELSE    = r'else'
t_WHILE   = r'while'
t_EQUAL   = r'\=\='
t_NOTEQ   = r'\!\='
t_LARGE   = r'\>'
t_SMALL   = r'\<'
t_LRGEQ   = r'\>\='
t_SMLEQ   = r'\<\='


# complex tokens
# number token
def t_NUMBER(t):
    r'\d+'  # digit special character regex
    t.value = int(t.value)  # convert str -> int
    return t


# Ignored characters
t_ignore = " \t"  # spaces & tabs regex

# newline character
def t_newline(t):
    r'\n+'  # newline special character regex
    t.lexer.lineno += t.value.count("\n")  # increase current line number accordingly


# error handling for invalid character
def t_error(t):
    print("Illegal character '%s'" % t.value[0])  # print error message with causing character
    t.lexer.skip(1)  # skip invalid character


# Build the lexer
lex.lex()

# === Yacc parsing/grammar component ===

# Precedence & associative rules for the arithmetic operators
# 1. Unary, right-associative minus.
# 2. Binary, left-associative multiplication, division, and modulus
# 3. Binary, left-associative addition and subtraction
# Parenthesis precedence defined through the grammar
precedence = (
    ('left', 'PLUS', 'MINUS'),
    ('left', 'TIMES', 'DIVIDE', 'MODULO'),
    ('right', 'UMINUS'),
)

# dictionary of names (for storing variables)
names = {}

# --- Grammar:
# <statement> -> NAME = <expression> | <expression>
# <expression> -> <expression> + <expression>
#               | <expression> - <expression>
#               | <expression> * <expression>
#               | <expression> / <expression>
#               | <expression> % <expression>
#               | - <expression>
#               | ( <expression> )
#               | NUMBER
#               | NAME
# ---
# defined below using function definitions with format string/comment
# followed by logic of changing state of engine


# if statement
def p_statement_if(p):
    '''statement : IF LPAREN comparison RPAREN statement
                    | IF LPAREN comparison RPAREN statement ELSE statement'''
    if p[3]:
        p[0] = p[5]
    else:
        if p[7] is not None:
            p[0] = p[7]


def p_statement_while(p):
    'statement : WHILE LPAREN comparison RPAREN statement'
    while(p[3]):
        p[5];


# assignment statement: <statement> -> NAME = <expression>
def p_statement_assign(p):
    'statement : NAME EQUALS expression'
    names[p[1]] = p[3]  # PLY engine syntax, p stores parser engine state


# expression statement: <statement> -> <expression>
def p_statement_expr(p):
    'statement : expression'
    print(p[1])


# comparison
def p_comparison_binop(p):
    '''comparison : expression EQUAL expression
                          | expression NOTEQ expression
                          | expression LARGE expression
                          | expression SMALL expression
                          | expression LRGEQ expression
                          | expression SMLEQ expression'''
    if p[2] == '==':
        p[0] = p[1] == p[3]
    elif p[2] == '!=':
        p[0] = p[1] != p[3]
    elif p[2] == '>':
        p[0] = p[1] > p[3]
    elif p[2] == '<':
        p[0] = p[1] < p[3]
    elif p[2] == '>=':
        p[0] = p[1] >= p[3]
    elif p[2] == '<=':
        p[0] = p[1] <= p[3]


# binary operator expression: <expression> -> <expression> + <expression>
#                                          | <expression> - <expression>
#                                          | <expression> * <expression>
#                                          | <expression> / <expression>
#                                          | <expression> % <expression>
def p_expression_binop(p):
    '''expression : expression PLUS expression
                          | expression MINUS expression
                          | expression TIMES expression
                          | expression DIVIDE expression
                          | expression MODULO expression'''
    if p[2] == '+':
        p[0] = p[1] + p[3]
    elif p[2] == '-':
        p[0] = p[1] - p[3]
    elif p[2] == '*':
        p[0] = p[1] * p[3]
    elif p[2] == '/':
        p[0] = p[1] / p[3]
    elif p[2] == '%':
        p[0] = p[1] % p[3]


# unary minus operator expression: <expression> -> - <expression>
def p_expression_uminus(p):
    'expression : MINUS expression %prec UMINUS'
    p[0] = -p[2]


# parenthesis group expression: <expression> -> ( <expression> )
def p_expression_group(p):
    'expression : LPAREN expression RPAREN'
    p[0] = p[2]


# number literal expression: <expression> -> NUMBER
def p_expression_number(p):
    'expression : NUMBER'
    p[0] = p[1]


# variable name literal expression: <expression> -> NAME
def p_expression_name(p):
    'expression : NAME'
    # attempt to lookup variable in current dictionary, throw error if not found
    try:
        p[0] = names[p[1]]
    except LookupError:
        print("Undefined name '%s'" % p[1])
        p[0] = 0


# handle parsing errors
def p_error(p):
    print("Syntax error at '%s'" % p.value)


# build parser
yacc.yacc()

# start interpreter and accept input using commandline/console
while True:
    try:
        s = input('calc > ')  # get user input. use raw_input() on Python 2
    except EOFError:
        break
    yacc.parse(s)  # parse user input string

Answer 1

您的基本问题是您的词法分析器无法识别关键字 if 和 while（也不识别 else），因为在这些情况下会触发 t_NAME 模式. section 4.3 of the Ply documentation 中描述了问题和可能的解决方案。问题是：

Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).

并且 t_NAME 的表达式比简单的关键字模式更长。

您不能通过将 t_NAME 变成一个词法分析器函数来解决这个问题，因为函数定义的标记在字符串定义的标记之前被检查。

但是你可以把t_NAME做成一个函数，在函数中查字典匹配到的字符串是不是保留字。（请参阅链接部分末尾的示例，在 "To handle reserved words..." 开头的段落中）。当你这样做时，你根本没有定义 t_IF、t_WHILE 和 t_ELSE。

shift-reduce 冲突是 "dangling else" 的问题。如果您搜索该短语，您会找到各种解决方案。

最简单的解决方案是什么也不做，只是忽略警告，因为默认情况下 Ply 会做正确的事情。

第二个最简单的解决方案是将 ('if', 'IF'), ('left', 'ELSE') 添加到优先级列表，并向 if 产生式添加优先级标记：

'''statement : IF LPAREN comparison RPAREN statement %prec IF
             | IF LPAREN comparison RPAREN statement ELSE statement'''

赋予 ELSE 比 IF 更高的优先级值可确保当解析器需要在第二个产生式中移动 ELSE 或在第一个产生式中减少时，它选择移位（因为 ELSE 具有更高的优先级）。事实上，这是默认行为，所以优先级声明根本不会影响解析行为；但是，它会抑制 shift-reduce 冲突警告，因为冲突已解决。

另一种解决方案，请参阅。

最后，请查看对您的问题的评论。您对 if 和 while 语句的操作根本不起作用。

Python PLY 问题与 if-else 和 while 语句

Python PLY issue with if-else and while statements

python

ply

python-3.x