如何在 PyPEG 中处理所有可能的 C 类块注释样式

Question

放弃后parsimonous I tried PyPEG。我取得了更大的成功，因为我已经实现了最初的目标，但似乎无法正确处理评论。

我已将问题提炼为以下代码。

您可以看到，如果块注释前面有代码（测试用例 4 和 5），则并非所有测试用例都有效，然后生成的是行而不是块注释。

有没有办法让 PyPEG 自己执行此操作，或者我是否需要对行进行后处理以查找存在于多行中的 BlockComments。

import pypeg2 as pp
import re
import pprint

nl = pp.RegEx(r"[\r\n]+")
symbols = "\"\-\[\]\!#$%&'()¬*+£,./:;<=>?@^_‘{|}~"

text = re.compile(r"[\w" + symbols + "]+", re.UNICODE)


# Partial definition as we use it before it's fully defined
class Code(pp.List):
    pass


class Text(str):
    grammar = text


class Line(pp.List):
    grammar = pp.maybe_some(Text), nl


class LineComment(Line):
    grammar = re.compile(r".*?//.*"), nl


class BlockComment(pp.Literal):
    grammar = pp.comment_c, pp.maybe_some(Text)


Code.grammar = pp.maybe_some([BlockComment, LineComment, Line])


comments = """
/*
Block comment 1
*/

// Line Comment1

Test2 // EOL Comment2

/*
Block comment 2*/

/* Block
comment 3 */

Test4 start /*
Block comment 4
*/ Test4 end

Test5 start /* Block comment 5 */ Test5 end

      /* Block comment 6 */

"""

parsed = pp.parse(comments, Code, whitespace=pp.RegEx(r"[ \t]"))
pprint.pprint(list(parsed))

Answer 1

您 text 的模式也将匹配评论；因为它是贪婪地应用的，所以评论不可能被匹配，除非它恰好在一行的开头。所以你需要确保遇到注释分隔符时停止匹配。

您可以尝试以下操作：

# I removed / from the list.
symbols = "\"\-\[\]\!#$%&'()¬*+£,.:;<=>?@^_‘{|}~"

text = re.compile(r"([\w" + symbols + "]|/(?![/*]))+", re.UNICODE)

尽管我不得不说 symbols 的列表对我来说似乎有些武断。我会用

text = re.compile(r"([^/\r\n]|/(?![/*]))+", re.UNICODE)

如何在 PyPEG 中处理所有可能的 C 类块注释样式

How to handle all possible C like block comment styles in PyPEG

python

parsing

peg

pypeg

parsimonious