Parsec 如何在字符串中找到 "matches"

Question

如何使用 parsec 解析字符串中所有匹配的输入并丢弃其余的？

示例：我有一个简单的数字解析器，如果我知道是什么将它们分开，我可以找到所有数字：

num :: Parser Int
num = read <$> many digit

parse (num `sepBy` space) "" "111 4 22"

但是如果我不知道数字之间是什么怎么办？

"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."

many anyChar 不能用作分隔符，因为它会消耗所有内容。

那么我怎样才能得到与我想忽略的东西包围的任意解析器相匹配的东西呢？

编辑：请注意，在实际问题中，我的解析器更复杂：

optionTag :: Parser Fragment
optionTag = do
    string "<option"
    manyTill anyChar (string "value=")
    n <- many1 digit
    manyTill anyChar (char '>')
    chapterPrefix
    text <- many1 (noneOf "<>")
    return $ Option (read n) text
  where
    chapterPrefix = many digit >> char '.' >> many space

Answer 1

您可以使用

 many ( noneOf "0123456789")

我不确定 "noneOf" 和 "digit" 类型，但您也可以尝试

many $ noneOf digit

Answer 2

对于任意解析器myParser，这很简单：

solution = many (let one = myParser <|> (anyChar >> one) in one)

这样写可能更清楚:

solution = many loop
    where 
        loop = myParser <|> (anyChar >> loop)

本质上，这定义了一个递归解析器（称为loop），它将继续搜索第一个可以被myParser解析的东西。 many 将简单地穷举搜索直到失败，即：EOF。

Answer 3

要在字符串中查找项目，项目要么在字符串的开头，要么消耗一个字符并在现在更短的字符串中查找项目。如果该项目不在字符串的开头，您需要取消使用在查找它时使用的字符，因此您需要一个 try 块。

hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>

此代码仅在字符串中查找一次 item。

（AJFarmar 差一点就有了。）

Answer 4

replace-megaparsec package allows you to split up a string into sections which match your pattern and sections which don't match by using the sepCap 解析器组合器。

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char

let num :: Parsec Void String Int
    num = read <$> many digitChar

>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]

Parsec 如何在字符串中找到 "matches"

Parsec how to find "matches" within a string

haskell

parsec