用于构建测试数据的 monad

A monad for building test data

好的,所以我正在尝试编写一个 monad 来构建测试数据,但我不能完全按照我想要的方式工作。它看起来像这样:

runBuildM :: [i] -> BuildM i o x -> [o]
-- Given a list of i, build a list of o.

source :: BuildM i o i
-- Fetch unique i.

yield :: o -> BuildM i o ()
-- Return a new o to the caller.

gather :: BuildM i o x -> BuildM i o o
-- Fetch every possible o from sub-computation.

local :: BuildM i o x -> BuildM i o x
-- Isolate any source invocations from the rest of the code.

换句话说,它是一个供应单子、写入器单子和列表单子。这个想法是我可以写这样的东西:

build_tests depth = do
  local $ do
    v <- source
    yield v
    yield (map toLower v)
  yield "[]"
  yield "()"
  when (depth > 2) $ do
    t1 <- gather $ build_tests (depth-1)
    yield $ "(" ++ t1 ++ ")"
    yield $ "[" ++ t1 ++ "]"
    t2 <- gather $ build_tests (depth-1)
    yield $ "(" ++ t1 ++ "," ++ t2 ++ ")"

想法是生成所有可能的数据组合。你可以只使用列表理解来做到这一点,但结果在语法上很糟糕。这 更具可读性。不幸的是,它实际上 工作 ...

问题似乎归结为 local 函数运行不正常。 intention 是为了让子计算中的任何 source 调用在它之外没有任何影响。 (即,从 local 块外部对 source 的后续调用再次获得第一个标记。)但是,我对 local 实际上 的实现重置 everything 的下一个标记(即,包括子计算的内容)。这显然是不正确的,但我终其一生都无法思考如何使其正常工作。

事实上,我在让代码按要求工作时遇到了这么多麻烦,这可能意味着我的 monad 的实际内部表示只是 错误。有人可以尝试正确实施吗?


编辑: 我也许应该意识到这一点,但我实际上并没有指定我想要获得的预期结果。上面的代码应该产生这个:

["A", "a", "[]", "()", "(A)", "(a)", "[A]", "[a]", "(A, B)", "(A, b)", "(a, B)", "(a, b)"]

结果完全按此顺序出现并不是特别重要。我希望单个案例出现在复合案例之前,但我并不太在意复合案例出现的确切顺序。规则是同一个变量永远不会在任何单个表达式中出现两次。

如果我们让深度更深一点,我们还会得到诸如

"((A))", "([A])", "[(A)]", "((A, B), C)", "(A, (B, C))"

等等。


它显然已损坏,但这是我目前拥有的:

newtype BuildM i o x = BuildM ([i] -> SEQ.Seq ([i], SEQ.Seq o, x))

instance Functor (BuildM i o) where
  fmap uf (BuildM sf) =
    BuildM $ \ is0 -> do
      (is1, os, x) <- sf is0
      return (is1, os, uf x)

instance Applicative (BuildM i o) where
  pure x = BuildM $ \ is0 -> return (is0, SEQ.empty, x)

  BuildM sf1 <*> BuildM sf2 =
    BuildM $ \ is1 -> do
      (is2, os2, f) <- sf1 is1
      (is3, os3, x) <- sf2 is2
      return (is3, os2 >< os3, f x)

instance Monad (BuildM i o) where
  return = pure

  BuildM sf1 >>= uf =
    BuildM $ \ is1 -> do
      (is2, os2, x) <- sf1 is1
      let BuildM sf2 = uf x
      (is3, os3, y) <- sf2 is2
      return (is3, os2 >< os3, y)

runBuildM :: [i] -> BuildM i o x -> [o]
runBuildM is0 (BuildM sf) =
  toList $ do
    (is, os, x) <- sf is0
    os

source :: BuildM i o i
source =
  BuildM $ \ is ->
    if null is
      then error "AHC.Tests.TestBuilder.source: end of input"
      else return (tail is, SEQ.empty, head is)

yield :: o -> BuildM i o ()
yield o = BuildM $ \ is -> return (is, SEQ.singleton o, () )

gather :: BuildM i o x -> BuildM i o' o
gather (BuildM sf1) =
  BuildM $ \ is1 -> do
    (is2, os2, _) <- sf1 is1
    o <- os2
    return (is2, SEQ.empty, o)

local :: BuildM i o x -> BuildM i o ()
local (BuildM sf1) =
  BuildM $ \ is1 ->
    let os = do (is2, os2, x) <- sf1 is1; os2
    in  return (is1, os, () )

您正在尝试重新发明 pipes。您的 sourceyield 是管道 awaityield。您要处理的另外两个问题分别是 ReaderTWriterT。如果将整个输入列表放在 ReaderT 的环境中,则可以 运行 local 从列表开头重新开始的子计算。您可以通过添加 WriterT 层来收集输出来收集子计算的所有结果。

为了 gather 的良好语法,您正在尝试重新创建 ListT

管道、读取器和写入器

我们将在很短的时间内使用以下所有内容。

import Data.Functor.Identity
import Data.Foldable

import Control.Monad
import Control.Monad.Morph
import Control.Monad.IO.Class
import Control.Monad.Trans.Class
import Control.Monad.Trans.Reader hiding (local)
import Control.Monad.Trans.Writer.Strict

import Pipes.Core
import Pipes ((>->))
import qualified Pipes as P
import qualified Pipes.Prelude as P

import Pipes.Lift (runWriterP, runReaderP)

您的构建器是 Pipe i oReader [i],它允许您在输入开始时重置。我们将定义它的两个版本,BuildT 是一个 monad 转换器,BuildM 是一个 monad。 BuildM 只是应用于 Identity 的转换器。

type BuildT e i o m r = Pipe i o (ReaderT e m) r
type BuildM e i o   r = BuildT e i o Identity r

local 运行 是一个构建器,它从环境中读取整个输入。我们可能想给它一个不同的名称,以避免与为 ReaderT

定义的 local 冲突
local :: (Monad m, Foldable f) => BuildT (f i) i o m () -> Proxy a' a () o (ReaderT (f i) m) ()
local subDef = do
    e <- lift ask
    hoist lift $ runReaderP e $
        P.each e >->
        subDef

为了收集子计算的结果,我们利用管道非常纯净的事实,只要您有自然转换,您就可以换掉底层的 monad forall x. m x -> n x。来自管道的代理有一个 MFunctor 实例,它提供了一个函数 hoist :: (forall x. m x -> n x) -> Proxy a' a b' b m r -> Proxy a' a b' b n r;它让我们提升管道下的所有底层 monad 操作以在另一个转换器上使用管道,在本例中 WriterT.

collect :: (Monad m) => Proxy a' a () b m r -> Proxy a' a c' c m ([b], r)
collect subDef = do
    (r, w) <- runWriterP $
        hoist lift subDef //> \x -> lift $ tell (++[x])
    return (w [], r)

我们向 运行 构建器提供来自环境的所有输入,提供初始环境,收集结果,然后 运行 整个管道。

runBuildT :: (Monad m) => [i] -> BuildT [i] i o m () -> m [o]
runBuildT e = runEffect . fmap fst . collect . runReaderP e . local

运行 monad 而不是 transformer 只是

runBuildM :: [i] -> BuildM [i] i o () -> [o]
runBuildM e = runIdentity . runBuildT e

列表T

本节让我们在生成事物的所有组合时使用 do-notation。这相当于使用管道的 for 代替每个 >>=yield 代替每个 return.

gather 子计算的所有结果的语法正在重新发明 ListT。一个 ListT m a 持有一个 Producer a m (),只有 return 的下游数据。从上游获取数据和下游 return 数据的管道不适合 Producer b m ()。这需要一些转换。

我们可以将同时具有上游和下游接口的 Proxy 转换为只有一个下游接口环绕另一个具有上游接口的代理的接口。为此,我们将底层 monad 提升到我们新的内部上游代理中,然后将外部下游代理中的所有 request 替换为从内部上游代理中提取的 request

floatRespond :: (Monad m) => Proxy a' a b' b m r -> Proxy c' c b' b (Proxy a' a d' d m) r
floatRespond = (lift . request >\) . hoist lift

这些可以转换成ListT。我们将丢弃任何 returned 数据以获得更多的多态类型。

gather :: (Monad m) => Proxy a' a () b m r -> P.ListT (Proxy a' a c' c m) b
gather = P.Select . floatRespond . (>>= return . const ())

使用ListT使用起来有点麻烦;您需要 return 之间的 mplus 才能获得两个输出。将代理推入 ListT 通常很方便,因此您可以 lift . yield 而不是 returning。我们将丢弃所有 ListT 结果,依赖来自 lift . yield.enumeratejust runs aListT` 的输出,包裹任何东西,丢弃所有结果

enumerate = P.runListT

例子

我们现在可以编写 运行 您的示例。我想你的意思是 source 从源中获取一个值,而 yield 到 return 一个值。如果您不需要一次获取一个值,那么您的问题被过度指定并且这个答案是矫枉过正的。

source = P.await
yield = P.yield

在示例中,我们使用 gather 构建列表,我们 运行 使用 enumerate 的代码部分并使用 lift . yield 生成结果。

import Data.Char

build_tests :: Monad m => Int -> BuildT [String] String String m ()
build_tests depth = do
  local $ do
    v <- source
    yield $ v
    yield $ (map toLower v)
  yield "[]"
  yield "()"
  when (depth > 2) $ enumerate $ do
    t1 <- gather $ build_tests (depth-1)
    lift . yield $ "(" ++ t1 ++ ")"
    lift . yield $ "[" ++ t1 ++ "]"
    t2 <- gather $ build_tests (depth-1)
    lift . yield $ "(" ++ t1 ++ "," ++ t2 ++ ")"

如果我们运行这个带有输入["A", "B"]的示例,则永远不会使用"B"输入,因为source在每个local中只使用过一次.

main = do
    putStrLn "Depth 2"
    print =<< runBuildT ["A", "B"] (build_tests 2)
    putStrLn "Depth 3"
    print =<< runBuildT ["A", "B"] (build_tests 3)

深度小于 4 的输出足够小,可以在此处重复。

["A","a","[]","()"]
Depth 3
["A","a","[]","()","(A)","[A]","(A,A)","(A,a)","(A,[])","(A,())","(a)","[a]","(a,A)","(a,a)","(a,[])","(a,())","([])","[[]]","([],A)","([],a)","([],[])","([],())","(())","[()]","((),A)","((),a)","((),[])","((),())"]

这可能有点矫枉过正

我怀疑您可能希望 source 从源头获取所有内容。

source = gather P.cat
yield = P.yield

如果我们将此用于示例而不是从源中获取单个项目,我们将 enumerate 第一个 local 块并通过 returning 在 ListT.

build_tests :: Monad m => Int -> BuildT [String] String String m ()
build_tests depth = do
  local $ enumerate $ do
    v <- source
    lift . yield $ v
    lift . yield $ (map toLower v)
  yield "[]"
  yield "()"
  when (depth > 2) $ enumerate $ do
    t1 <- gather $ build_tests (depth-1)
    lift . yield $ "(" ++ t1 ++ ")"
    lift . yield $ "[" ++ t1 ++ "]"
    t2 <- gather $ build_tests (depth-1)
    lift . yield $ "(" ++ t1 ++ "," ++ t2 ++ ")"

当我们 运行 具有两个源的示例时,这会使用两个源值。

Depth 2
["A","a","B","b","[]","()"]
Depth 3
["A","a","B","b","[]","()","(A)","[A]","(A,A)","(A,a)","(A,B)","(A,b)","(A,[])","(A,())","(a)","[a]","(a,A)","(a,a)","(a,B)","(a,b)","(a,[])","(a,())","(B)","[B]","(B,A)","(B,a)","(B,B)","(B,b)","(B,[])","(B,())","(b)","[b]","(b,A)","(b,a)","(b,B)","(b,b)","(b,[])","(b,())","([])","[[]]","([],A)","([],a)","([],B)","([],b)","([],[])","([],())","(())","[()]","((),A)","((),a)","((),B)","((),b)","((),[])","((),())"]

如果您从未从源中获得任何值,您可以使用 ListT (ReaderT [i] m) o 代替。您可能仍然需要一个代理来避免弄乱 mplus.

如果 is overkill, the continuation monad transformer provides a convenient way to construct any MonadPlus 个值。

continuation monad 让我们很容易捕捉到做某事的想法mplus 未知的余数。

import Control.Monad
import Control.Monad.Trans.Cont

once :: MonadPlus m => m a -> ContT a m ()
once m = ContT $ \k -> m `mplus` k ()

产生结果只返回一次。

yield :: MonadPlus m => a -> ContT a m ()
yield = once . return

我们可以通过在最后粘贴 mzero 来收集所有结果。

gather :: MonadPlus m => ContT a m r -> m a
gather m = runContT m (const mzero)

您的示例是根据 yieldgatheroncelift.

编写的
import Data.Char

import Control.Monad.Trans.Class

build_tests :: MonadPlus m => m String -> Int -> ContT String m ()
build_tests source = go
  where
    go depth = do
      once . gather $ do
        v <- lift source
        yield v
        yield (map toLower v)
      yield "[]"
      yield "()"
      when (depth > 2) $ do
        t1 <- lift . gather $ go (depth-1)
        yield $ "(" ++ t1 ++ ")"
        yield $ "[" ++ t1 ++ "]"
        t2 <- lift . gather $ go (depth-1)
        yield $ "(" ++ t1 ++ "," ++ t2 ++ ")"

main = print . gather $ build_tests ["A", "B"] 3

这会输出以下内容:

Depth 2
["A","a","B","b","[]","()"]
Depth 3
["A","a","B","b","[]","()","(A)","[A]","(A,A)","(A,a)","(A,B)","(A,b)","(A,[])","(A,())","(a)","[a]","(a,A)","(a,a)","(a,B)","(a,b)","(a,[])","(a,())","(B)","[B]","(B,A)","(B,a)","(B,B)","(B,b)","(B,[])","(B,())","(b)","[b]","(b,A)","(b,a)","(b,B)","(b,b)","(b,[])","(b,())","([])","[[]]","([],A)","([],a)","([],B)","([],b)","([],[])","([],())","(())","[()]","((),A)","((),a)","((),B)","((),b)","((),[])","((),())"]

为简单起见,我冒昧地取消了从环境中读取原始资源的要求。您可以将 ReaderT 添加到转换器堆栈以将其取回。我也没有为你选择列表转换器,示例是 运行 使用普通列表 monad。由于它是根据 MonadPlus 编写的,因此它也适用于任何 (MonadTrans t, MonadPlus (t m)) => t m

您正在尝试重新发明 pipes and some 。这个问题比你如何描述它要简单得多。字符串的来源可以完全与构建结构分开。

您想生成从某些来源绘制符号的结构。不用担心来源,让我们构建结构。每个结构都是一个 Pipe,它将从一些源和 yield 字符串中提取以连接在一起以构建表达式。

import Data.Char

import Data.Functor.Identity

import Pipes.Core
import Pipes ((>->))
import qualified Pipes as P
import qualified Pipes.Prelude as P

build_structures :: Int -> [Pipe String String Identity ()]
build_structures depth = gather $ do
    yield $ P.take 1
    yield $ P.map (map toLower) >-> P.take 1
    when (depth > 2) $ do
        t1 <- lift $ build_structures (depth - 1)
        yield $ P.yield "(" >> t1 >> P.yield ")"
        yield $ P.yield "[" >> t1 >> P.yield "]"
        t2 <- lift $ build_structures (depth - 1)
        yield $ P.yield "(" >> t1 >> P.yield "," >> t2 >> P.yield ")"

此代码使用后续答案中的

我们 运行 通过向其提供符号并连接结果来 运行 这些结构之一。

run :: Pipe String String Identity () -> String
run p = concat . P.toList $ P.each symbols >-> p

-- an infinite source of unique symbols
symbols :: [String]
symbols = drop 1 symbols'
    where
        symbols' = [""] ++ do
            tail <- symbols'
            first <- ['A'..'Z']
            return (first : tail)

示例生成所需的字符串。我将留下两个特殊情况 "[]""()",它们不会出现在递归术语中,作为练习。

import Data.Functor

main = do
    putStrLn "Depth 2"
    print $ run <$> build_structures 2
    putStrLn "Depth 3"
    print $ run <$> build_structures 3
    putStrLn "Depth 4"
    print $ run <$> build_structures 4

这导致

Depth 2
["A","a"]
Depth 3
["A","a","(A)","[A]","(A,B)","(A,b)","(a)","[a]","(a,B)","(a,b)"]
Depth 4
["A","a","(A)","[A]","(A,B)","(A,b)","(A,(B))","(A,[B])","(A,(B,C))","(A,(B,c))","(A,(b))","(A,[b])","(A,(b,C))","(A,(b,c))","(a)","[a]","(a,B)","(a,b)","(a,(B))","(a,[B])","(a,(B,C))","(a,(B,c))","(a,(b))","(a,[b])",...