如何使用 parsec 忽略任意标记？

Question

我想用秒差距替换 sed 和 awk。例如，从 unknown structure but containing the number 42 and maybe some other stuff.

这样的字符串中提取数字

我运行变成了"unexpected end of input"。我正在寻找非贪婪 .*([0-9]+).*.

的等价物

module Main where

import Text.Parsec

parser :: Parsec String () Int
parser = do
    _ <- many anyToken
    x <- read <$> many1 digit
    _ <- many anyToken
    return x

main :: IO ()
main = interact (show . parse parser "STDIN")

Answer 1

这行不通，因为 anyToken 接受并使用 - 正如其名称所示 - 任何令牌，包括数字。然后你应用它 many 次。因此，使用第二个解析器读取数字的尝试一定会失败。根本就没有令牌了。

改为让您的第一个解析器接受任何字符，不是数字（使用模块 Data.Char 中的 isDigit）：

parser :: Parsec String () Int
parser = do
    _ <- many $ satisfy (not . isDigit)
    x <- read <$> many1 digit
    _ <- many anyToken
    return x

Answer 2

这可以通过我的库轻松完成 regex-applicative。它为您提供了您似乎想要的组合器接口和正则表达式的功能。

这是最接近您的示例的工作版本：

{-# LANGUAGE ApplicativeDo #-}
import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)

parser :: RE Char Int
parser = do
    _ <- few anySym
    x <- decimal
    _ <- many anySym
    return x

main :: IO ()
main = interact (show . match parser)

这是一个更短的版本，使用 findFirstInfix:

import Text.Regex.Applicative
import Text.Regex.Applicative.Common (decimal)

main :: IO ()
main = interact (snd3 . findFirstInfix decimal)
  where snd3 (_, r, _) = r

如果您想执行实际的分词（例如跳过 foo93bar 中的 93），请查看 lexer-applicative，一个基于 regex-applicative 的分词器。

Answer 3

用解析器替换 sed 和 awk 是什么 replace-megaparsec 图书馆就是一切。

从非结构化字符串中提取数字 sepCap 解析器组合器。

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char.Lexer

parseTest (sepCap (decimal :: Parsec Void String Int))
  $ "unknown structure but containing the number 42 and maybe some other stuff"

[ Left "unknown structure but containing the number "
, Right 42
, Left " and maybe some other stuff"
]

如何使用 parsec 忽略任意标记？

How to ignore arbitrary tokens using parsec?

haskell

parsec