PyParsing 忽略换行符？

Question

我想解析如下所示的 git 日志文件：

d2436fa AuthorName 2015-05-15 Commit Message
4    3    README.md

我期望的输出如下所示：

[ ['d2436fa', 'AuthorName', '2015-05-15', 'Commit Message'],
[4, 3, 'README.md'] ]

我解析这个的语法是：

hsh = Word(alphanums, exact=7)
author = OneOrMore(Word(alphas + alphas8bit + '.'))
date = Regex('\d{4}-\d{2}-\d{2}')
message = OneOrMore(Word(printables + alphas8bit))
count = Word(nums)
file = Word(printables)
blankline = LineStart() + LineEnd()

commit = hsh + Combine(author, joinString=' ', adjacent=False) + \
         date + Combine(message, joinString=' ', adjacent=False) + LineEnd()
changes = count + count + file + LineEnd()
check = commit ^ changes ^ blankline

我实际得到的输出是：

['d2436fa', 'AuthorName', '2015-05-15', 'Commit Message 4 3 README.md']

为什么忽略换行符？我认为这就是 LineEnd() 的用途？当我拆分 '\n' 时，一切正常：/

Answer 1

pyparsing 有一个（有争议的？）rule 关于语法中的空格：

During the matching process, whitespace between tokens is skipped by default (although this can be changed)

而且，正如它所说，它可以更改。您可以通过执行以下操作来设置 pp 认为是空白的内容：

i_consider_whitespaces_to_be_only = ' '
ParserElement.setDefaultWhitespaceChars(i_consider_whitespaces_to_be_only)

（这将告诉它只使用空格，而不是换行符；当然，您还可以添加其他内容，例如制表符。）

PyParsing 忽略换行符？

PyParsing ignores newline?

python

pyparsing