我怎样才能编写一个更通用(但更有效)的 attoparsec 的 takeWhile1 版本?
How can I write a more general (but efficient) version of attoparsec's takeWhile1?
Data.Attoparsec.Text
导出 takeWhile
和 takeWhile1
:
takeWhile :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns True
, and return the consumed input.
This parser does not fail. It will return an empty string if the predicate returns False
on the first character of input.
[...]
takeWhile1 :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns True
, and return the consumed input.
This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns True
or if there is no input left.
attoparsec
的文档鼓励用户
Use the Text
-oriented parsers whenever possible, e.g. takeWhile1
instead of many1 anyChar
. There is about a factor of 100 difference in performance between the two kinds of parser.
这两个解析器非常有用,但我一直觉得需要一个更通用的 takeWhile1
版本,更具体地说,一些假设的解析器
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined
将解析 至少 lo
个满足谓词 f
的字符,其中 lo
是任意非负整数。
我看过 takeWhile1
's implementation,但它使用了一堆 Data.Attoparsec.Text.Internal
私有的函数,而且似乎不容易推广。
我想出了以下应用实现:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding ( takeWhile )
import Control.Applicative ( (<*>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f
它像宣传的那样工作,
λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"
但是需要打包 count
返回的中间结果列表让我担心,特别是对于 lo
很大的情况...这似乎违背了 [=40] 的建议=]
use the Text
-oriented parsers whenever possible [...]
我错过了什么吗?有没有更多的 efficient/idiomatic 方法来实现这样的 takeWhileLo
组合器?
Parser
是一个 monad,因此您可以只检查 return 值,如果长度不正确则失败:
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
text <- takeWhile f
case T.compareLength text lo of
LT -> empty
_ -> return text
compareLength
来自 text
包。它比比较 text
的长度更有效,因为 compareLength
可能会短路。
Data.Attoparsec.Text
导出 takeWhile
和 takeWhile1
:
takeWhile :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser does not fail. It will return an empty string if the predicate returns
False
on the first character of input.[...]
takeWhile1 :: (Char -> Bool) -> Parser Text
Consume input as long as the predicate returns
True
, and return the consumed input.This parser requires the predicate to succeed on at least one character of input: it will fail if the predicate never returns
True
or if there is no input left.
attoparsec
的文档鼓励用户
Use the
Text
-oriented parsers whenever possible, e.g.takeWhile1
instead ofmany1 anyChar
. There is about a factor of 100 difference in performance between the two kinds of parser.
这两个解析器非常有用,但我一直觉得需要一个更通用的 takeWhile1
版本,更具体地说,一些假设的解析器
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = undefined
将解析 至少 lo
个满足谓词 f
的字符,其中 lo
是任意非负整数。
我看过 takeWhile1
's implementation,但它使用了一堆 Data.Attoparsec.Text.Internal
私有的函数,而且似乎不容易推广。
我想出了以下应用实现:
{-# LANGUAGE OverloadedStrings #-}
import Prelude hiding ( takeWhile )
import Control.Applicative ( (<*>) )
import Data.Text ( Text )
import qualified Data.Text as T
import Data.Attoparsec.Text
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo =
T.append . T.pack <$> count lo (satisfy f) <*> takeWhile f
它像宣传的那样工作,
λ> parseOnly (takeWhileLo (== 'a') 4) "aaa"
Left "not enough input"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaa"
Right "aaaa"
λ> parseOnly (takeWhileLo (== 'a') 4) "aaaaaaaaaaaaa"
Right "aaaaaaaaaaaaa"
但是需要打包 count
返回的中间结果列表让我担心,特别是对于 lo
很大的情况...这似乎违背了 [=40] 的建议=]
use the
Text
-oriented parsers whenever possible [...]
我错过了什么吗?有没有更多的 efficient/idiomatic 方法来实现这样的 takeWhileLo
组合器?
Parser
是一个 monad,因此您可以只检查 return 值,如果长度不正确则失败:
takeWhileLo :: (Char -> Bool) -> Int -> Parser Text
takeWhileLo f lo = do
text <- takeWhile f
case T.compareLength text lo of
LT -> empty
_ -> return text
compareLength
来自 text
包。它比比较 text
的长度更有效,因为 compareLength
可能会短路。