Python 解析器层不处理空格

Question

我使用 ply 解析数据。我尝试使用 space 作为词素的一部分。这里有一个简化的例子：

from ply.lex import lex
from ply.yacc import yacc

tokens = ('NUM', 'SPACE')

t_NUM = r'\d+'
t_SPACE = r' '

def t_error(t):
    print(f'Illegal character {t.value[0]!r}')
    t.lexer.skip(1)

lexer = lex()

def p_two(p):
    '''
    two : NUM SPACE NUM
    '''
    p[0] = ('two', p[1], p[2], p[3])

def p_error(p):
    if p:
        print(f"Syntax error at '{p.value}'")
    else:
        print("Syntax error at EOF")

parser = yacc()

ast = parser.parse('1 2')
print(ast)

当我运行时，出现错误：

ERROR: Regular expression for rule 't_SPACE' matches empty string
Traceback (most recent call last):
  File "c:\demo\simple_space.py", line 19, in <module>
    lexer = lex()
  File "C:\demordparty\ply\ply\lex.py", line 752, in lex
    raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer

是否可以将 space 指定为词素的一部分？一些额外的可能标记：

t_COMMENT = r' \#.*' 征求意见
t_DIVIDE = r': +' 用于分隔线

Answer 1

我不知道为什么它不起作用

但它似乎适用于由同一作者
创建的sly （但几年后-所以他可以在写作后有经验ply）

from sly import Lexer, Parser

class MyLexer(Lexer):
    tokens = { NUM, SPACE }

    NUM = r'\d+'
    SPACE = r' '

    def error(self, t):
        print(f'Illegal character {t.value[0]!r}')
        t.lexer.skip(1)

class MyParser(Parser):
    tokens = MyLexer.tokens

    @_('NUM SPACE NUM')
    def two(self, p):
        return ('two', p.NUM0, p.SPACE, p.NUM1)
        

lexer = MyLexer()
parser = MyParser()

ast = parser.parse(lexer.tokenize('1 2'))
print(ast)

编辑：

有趣的是文本 't_SPACE' matches empty string，它向我暗示 space 可能具有特殊含义，所以我测试了 "\ " - 它有效

t_SPACE = r'\ '

Answer 2

Specification of tokens 上的 Ply 手册部分对此进行了解释：

Internally, lex.py uses the re module to do its pattern matching. Patterns are compiled using the re.VERBOSE flag which can be used to help readability. However, be aware that unescaped whitespace is ignored and comments are allowed in this mode. If your pattern involves whitespace, make sure you use \s. If you need to match the # character, use [#].

所以文字 space 字符必须写成 [ ] 或 \ 。（\s，如手册中所建议，匹配任何白色 space，而不仅仅是 space 字符。）

Python 解析器层不处理空格

Python parser ply does not handle spaces

python

parsing

ply

lexer