如何使用 fparsec 解析由双空格分隔的单词序列?

How to parse seq of words separated by double spaces using fparsec?

给定输入:

alpha beta gamma  one two three

如何将其解析为以下内容?

[["alpha"; "beta"; "gamma"]; ["one"; "two"; "three"]]

如果有更好的分隔符(e.g.__),我可以写这个,因为那时

sepBy (sepBy word (pchar ' ')) (pstring "__")

有效,但在双 space 的情况下,第一个 sepBy 中的 pchar 消耗第一个 space,然后解析器失败。

我建议用这样的东西替换 sepBy word (pchar ' ')

let pOneSpace = pchar ' ' .>> notFollowedBy (pchar ' ')
let pTwoSpaces = pstring "  "
// Or if two spaces are allowed as separators but *not* three spaces...
let pTwoSpaces = pstring "  " .>> notFollowedBy (pchar ' ')
sepBy (sepBy word pOneSpace) pTwoSpaces

注意:未测试(因为我目前没有时间),只是在答案框中输入。所以测试一下,以防我在某处出错。

sepBy p sep中的FParsec手册says,如果sep成功,后面的p失败(不改变状态),整个sepBy 也失败了。因此,您的目标是:

  1. 使分隔符 失败 如果遇到多个 space 字符;
  2. 回溯,这样"inner"sepBy循环就愉快地结束了,把控制权交给了"outer"sepBy循环。

以下是两者的操作方法:

// this is your word parser; it can be different of course,
// I just made it as simple as possible;
let pWord = many1Satisfy isAsciiLetter

// this is the Inner separator to separate individual words
let pSepInner =
    pchar ' '
    .>> notFollowedBy (pchar ' ') // guard rule to prevent 2nd space
    |> attempt                    // a wrapper that would fail NON-fatally

// this is the Outer separator
let pSepOuter =
    pchar ' '
    |> many1  // loop

// this is the parser that would return String list list
let pMain =
    pWord
    |> sepBy <| pSepInner         // the Inner loop
    |> sepBy <| pSepOuter         // the Outer loop

使用:

run pMain "alpha beta gamma  one two three"
Success: [["alpha"; "beta"; "gamma"]; ["one"; "two"; "three"]]