我怎样才能编写一个更通用(但更有效)的 attoparsec 的 takeWhile1 版本?

How can I write a more general (but efficient) version of attoparsec's takeWhile1?

Data.Attoparsec.Text 导出 takeWhiletakeWhile1:

takeWhile :: (Char -> Bool) -> Parser Text

Consume input as long as the predicate returns True, and return the consumed input.

This parser does not fail. It will return an empty string if the predicate returns False on the first character of input.

[...]

takeWhile1 :: (Char -> Bool) -> Parser Text

Consume input as long as the predicate returns True, and return the consumed input.

This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns True or if there is no input left.

attoparsec 的文档鼓励用户

Use the Text-oriented parsers whenever possible, e.g. takeWhile1 instead of many1 anyChar. There is about a factor of 100 difference in performance between the two kinds of parser.

这两个解析器非常有用,但我一直觉得需要一个更通用的 takeWhile1 版本,更具体地说,一些假设的解析器

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined

将解析 至少 lo 个满足谓词 f 的字符,其中 lo 是任意非负整数。

我看过 takeWhile1's implementation,但它使用了一堆 Data.Attoparsec.Text.Internal 私有的函数,而且似乎不容易推广。

我想出了以下应用实现:

{-# LANGUAGE OverloadedStrings #-}

import           Prelude                  hiding ( takeWhile )

import           Control.Applicative             ( (<*>) )
import           Data.Text                       ( Text )
import qualified Data.Text           as T

import           Data.Attoparsec.Text

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
  T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f

它像宣传的那样工作,

λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"

但是需要打包 count 返回的中间结果列表让我担心,特别是对于 lo 很大的情况...这似乎违背了 [=40] 的建议=]

use the Text-oriented parsers whenever possible [...]

我错过了什么吗?有没有更多的 efficient/idiomatic 方法来实现这样的 takeWhileLo 组合器?

Parser 是一个 monad,因此您可以只检查 return 值,如果长度不正确则失败:

takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
  text <- takeWhile f
  case T.compareLength text lo of
    LT -> empty
    _  -> return text

compareLength 来自 text 包。它比比较 text 的长度更有效,因为 compareLength 可能会短路。