pyparsing：忽略任何不匹配的标记

Question

我有一个来自我正在尝试解析的游戏的文件。以下是摘录：

    <stage> id: 50  #Survival Stage
            <phase> bound: 1500  # phase 0   bandit
                    music: bgm\stage4.wma
                    id: 122  x: 100  #milk  ratio: 1
                    id: 30 hp: 50  times: 1
                    id: 30 hp: 50  times: 1  ratio: 0.7
                    id: 30 hp: 50  times: 1  ratio: 0.3
            <phase_end>
    <stage_end>

# 表示评论，但只针对人类读者，而不是游戏的解析器。前两条注释到行尾，但是#milk后面的ratio: 1不属于注释的一部分，它实际上算在内。我认为游戏的解析器会忽略它无法理解的任何标记。有没有办法在 pyparsing 中做到这一点？

我尝试使用 parser.ignore(pp.Word(pp.printables))，但这使得它跳过了一切。到目前为止，这是我的代码：

import pyparsing as pp

txt = """
<stage> id: 50  #Survival Stage
        <phase> bound: 1500  # phase 0   bandit
                music: bgm\stage4.wma
                id: 122  x: 100  #milk  ratio: 1
                id: 30 hp: 50  times: 1
                id: 30 hp: 50  times: 1  ratio: 0.7
                id: 30 hp: 50  times: 1  ratio: 0.3
        <phase_end>
<stage_end>
"""

phase = pp.Literal('<phase>')
stage = pp.Literal('<stage>') + pp.Literal('id:') + pp.Word(pp.nums)('id') + pp.OneOrMore(phase)
parser = stage

parser.ignore(pp.Word(pp.printables))

print(parser.parseString(txt).dump())

Answer 1

事实证明，在原版游戏文件中只有 ratio: 关键字出现在 # 之后，所以我用它来定义评论的结尾，如下所示：

parser.ignore(Suppress('#') + SkipTo(MatchFirst([FollowedBy('ratio:'), LineEnd()])))

pyparsing：忽略任何不匹配的标记

pyparsing: ignore any token that doesn't match

python

pyparsing