如何在 Parsec 的 monadic 上下文中 return 多次解析失败?

How to return multiple parse failures within Parsec's monadic context?

我正在解析一个语法,它由 正好两个 必需且唯一的逻辑部分组成,AlphaBeta。这些部分可以按任何顺序定义,AlphaBeta 之前或 visa-vera。我想为不太懂技术的用户提供可靠的错误消息。

在下面的示例中,存在多个解析失败的情况。我将失败消息 Strings 与在 grammarDefinition 上调用的 unlines function and pass the resulting concatenation into the fail combinator. This creates a ParseError value with a single Message value when parse 连接起来。

示例场景:

import Data.Either                   (partitionEithers)
import Data.Set                      (Set)
import Text.Parsec                   (Parsec)
import Text.Parsec.Char
import Text.ParserCombinators.Parsec

data Result = Result Alpha Beta
type Alpha  = Set (Int,Float)
type Beta   = Set String

grammarDefinition :: Parsec String u Result
grammarDefinition = do
    segments <- partitionEithers <$> many segment
    _        <- eof
    case segments of
      (     [],      []) -> fail $ unlines [missingAlpha, missingBeta]
      (      _,      []) -> fail $ missingBeta
      (     [],       _) -> fail $ missingAlpha
      ((_:_:_), (_:_:_)) -> fail $ unlines [multipleAlpha, multipleBeta]
      (      _, (_:_:_)) -> fail $ multipleBeta
      ((_:_:_),       _) -> fail $ multipleAlpha
      (    [x],     [y]) -> pure $ Result x y
    where
      missingAlpha     = message "No" "alpha"
      missingBeta      = message "No" "beta"
      multipleAlpha    = message "Multiple" "alpha"
      multipleBeta     = message "Multiple" "beta"
      message x y      = concat [x," ",y," defined in input, ","exactly one ",y," definition required"]

-- Type signature is important!
segment :: Parsec String u (Either Alpha Beta)
segment = undefined -- implementation irrelevant

我想要 ParseError to contain multiple Message values in the case of multiple failures. This should be possible due to the existence of the addErrorMessage function. I am not sure hw to supply multiple failure within the Parsec monadic context, before the result is materialized by calling parse.

示例函数:

fails :: [String] -> ParsecT s u m a
fails = undefined -- Not sure how to define this!

如何提供 多个 Message values to the ParseError 结果 within Parsec 的 monadic 上下文?

在这种情况下,

fail 等同于 Text.Parsec.Prim:

中定义的 parserFail
parserFail :: String -> ParsecT s u m a
parserFail msg
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ newErrorMessage (Message msg) (statePos s)

由于 newErrorMessageaddErrorMessage 都创建了 ParseErrorparserFail 的这种变体也应该有效:

parserFail' :: String -> ParsecT s u m a
parserFail' msg
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ theMessages s
where
  theMessages s =
    addErrorMessage (Message "blah") $
      addErrorMessage (Expect "expected this") $
        newErrorMessage (Message msg) (statePos s)

应该将 3 条消息推送到错误消息列表中。

也在那个模块中,看看 labellabels 这是 唯一使用 addErrorMessage 的地方。 labels只是一个多消息 <?> 运算符的版本。请注意它如何使用 foldr 来构建化合物 错误信息:

labels :: ParsecT s u m a -> [String] -> ParsecT s u m a
labels p msgs =
    ParsecT $ \s cok cerr eok eerr ->
    let eok' x s' error = eok x s' $ if errorIsUnknown error
                  then error
                  else setExpectErrors error msgs
        eerr' err = eerr $ setExpectErrors err msgs

    in unParser p s cok cerr eok' eerr'

 where
   setExpectErrors err []         = setErrorMessage (Expect "") err
   setExpectErrors err [msg]      = setErrorMessage (Expect msg) err
   setExpectErrors err (msg:msgs)
       = foldr (\msg' err' -> addErrorMessage (Expect msg') err')
         (setErrorMessage (Expect msg) err) msgs

唯一的问题是您需要访问 ParsecT 构造函数 Text.Parsec.Prim 未导出。也许你可以找到一种方法来使用 labels 或解决该问题的另一种方法。否则你总是可以包括你的 使用您的代码拥有自己的 parsec 黑客版本。

我们可以利用 ParsecT is an instance of MonadPlus to combine the definition of mzero with the function labels 这一事实来得出所需的结果:

fails :: [String] -> ParsecT s u m a
fails = labels mzero

注: ParseError has many Expect values, not many Message 值...

我建议从 Parsec to newer and more extensible Megaparsec 图书馆过渡。

This exact issue 自版本 4.2.0.0.

以来已解决

多个解析错误 Message 可以使用以下函数轻松创建:

fails :: MonadParsec m => [String] -> m a
fails = failure . fmap Message