如何使用 ruby treetop 解析多行?

how to parse multiple lines using ruby treetop?

我是 rubytreetop 的新手。

我经历了 this tutorial 并提出了以下规则集。

grammar Sexp

  rule body
    commentPortString *(I am stuck here)*
  end

  rule interface
    space? (intf / intfWithSize) space? ('\n' / end_of_file) <Interface>
  end

  rule commentPortString
    space? '//' space portString space? ('\n' / end_of_file) <CommentPortString>
  end

  rule portString
    'Port' space? '.' <PortString>
  end

  rule expression
    space? '(' body ')' space? <Expression>
  end

  rule intf
    (input / output) space wire:wireName space? ';' <Intf>
  end

  rule intfWithSize
    (input / output) space? width:ifWidth space? wire:wireName space? ';' <IntfWithSize>
  end

  rule input
    'input'
  end

  rule output
    'output'
  end

  rule ifWidth
    '[' space? msb:digits space? ':' space? lsb:digits ']' <IfWidth>
  end

  rule digits
    [0-9]+
  end

  rule integer
    ('+' / '-')? [0-9]+ <IntegerLiteral>
  end

  rule float
    ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
  end

  rule string
    '"' ('\"' / !'"' .)* '"' <StringLiteral>
  end

  rule signalTypeString
    '"' if_sig_name:signalType '"' <SignalTypeString>
  end

  rule signalType
    [a-zA-Z] [a-zA-Z0-9_]* (receiveLiteral / transmitLiteral) <SignalType>
  end

  rule receiveLiteral
    '.receive'
  end

  rule transmitLiteral
    '.transmit'
  end

  rule identifier
    [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
  end

  rule wireName
    [a-zA-Z] [a-zA-Z0-9_]* <WireName>
  end

  rule non_space
    !space .
  end

  rule space
    [\s\t]+
  end

  rule newLine
    [\n\r]+
  end

  rule end_of_file
    !.
  end

end

我希望解析器提取如下所示的 blob。它总是以 Port. 开头,以空行结尾。

    // Port.
    output        send;
    input         free;
    output        fgcg;
    output[  2:0] state_id;
    output[  1:0] stream_id;
`ifdef SIMULATION
    output[ 83:0] dbg_id;
`endif

上面提到的规则在单独传递时可以识别文本中的所有行,但我无法提取出blob。另外我只想提取出匹配的文本并忽略其余部分。

有人能给我指出正确的方向吗?

按照您要查找的内容,类似于下面的内容。如果不提供更多信息,很难完全理解您的问题。

space 规则包括 \s,而 \s 已经包括 \n,因此如果您正在寻找另一个 \n,它将无法正确解析。如果您将 space 规则修改为 [^\S\n]+,它将排除 \n,因此您可以明确地查找它。

如果您正在寻找一个完全空白的行来结束 Port. 块,您应该明确查找 "\n" ("\n" / end_of_file).

希望这是有道理的...

grammar Sexp

  rule body
    commentPortString interface* portEnd
  end

  rule interface
    space? (intf / intfWithSize) space? "\n" <Interface>
  end

  rule commentPortString
    space? '//' space? portString space? "\n" <CommentPortString>
  end

  rule portString
    'Port' space? '.' <PortString>
  end

  # Port block ends with a blank line
  rule portEnd
    "\n" / end_of_file
  end

  rule expression
    space? '(' body ')' space? <Expression>
  end

  rule intf
    (input / output) space wire:wireName space? ';' <Intf>
  end

  rule intfWithSize
    (input / output) space? width:ifWidth space? wire:wireName space? ';' <IntfWithSize>
  end

  rule input
    'input'
  end

  rule output
    'output'
  end

  rule ifWidth
    '[' space? msb:digits space? ':' space? lsb:digits ']' <IfWidth>
  end

  rule digits
    [0-9]+
  end

  rule integer
    ('+' / '-')? [0-9]+ <IntegerLiteral>
  end

  rule float
    ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
  end

  rule string
    '"' ('\"' / !'"' .)* '"' <StringLiteral>
  end

  rule signalTypeString
    '"' if_sig_name:signalType '"' <SignalTypeString>
  end

  rule signalType
    [a-zA-Z] [a-zA-Z0-9_]* (receiveLiteral / transmitLiteral) <SignalType>
  end

  rule receiveLiteral
    '.receive'
  end

  rule transmitLiteral
    '.transmit'
  end

  rule identifier
    [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
  end

  rule wireName
    [a-zA-Z] [a-zA-Z0-9_]* <WireName>
  end

  rule non_space
    !space .
  end

  rule space
    [^\S\n]+
  end

  rule newLine
    [\n\r]+
  end

  rule end_of_file
    !.
  end

end