Haskell 管道：是否可以选择获得源的结果？

Question

我从 Data.Conduit 构建了以下类型：

type Footers = [(ByteString, ByteString)]

type DataAndConclusion = ConduitM () ByteString IO Footers

第二种想法是"produce a lot of ByteStrings, and if you can produce all of them, return a Footers"。条件是因为管道由下游函数控制，所以 DataAndConclusion 的消费者可能不需要消费它的所有项目，在这种情况下 return 就不会到达。这正是我需要的行为。但是当到达源的末尾时，我想要生成的页脚。这将很有用，例如，如果 DataAndConclusions 正在增量计算 MD5，并且仅当整个消息由下游处理时才需要这样的 MD5（例如，下游可能只是通过网络发送它，但这没有意义如果套接字在下游发送最后一块之前关闭，则完成计算并发送 MD5）。

所以，基本上我想用这个签名来消费 DataAndConclusions:

 type MySink = Sink ByteString IO ()

 mySink :: MySink 
 mySink = ... 

 difficultFunction :: ConduitM () a2 m r1 -> ConduitM a2 Void m r2 -> m (Maybe r1)

问题是，有什么办法可以实现"difficultFunction"？怎么样？

Answer 1

This would be useful for example if the DataAndConclusions were incrementally computing an MD5 and such MD5 was only needed if the entire message was processed by the downstream

与其依赖上游管道的 return 值，在这种情况下，也许您可以在 ConduitM 下面的 StateT 层中累积正在进行的 MD5 计算，然后访问它在运行管道之后。

难题的另一部分是检测生产者是否先完成。 Sinks 可以检测 await 调用中的上游输入结束。您可以编写一个 Sink 以其自己的结果类型通知您上游终止，可能带有 Maybe.

但是，如果给您的 Sink 还没有这样做呢？我们需要像 Sink i m r -> Sink i m (Maybe r) 这样的函数。 "Given a Sink that may halt early, return a new Sink that returns Nothing if upstream finishes first"。但是我不知道那个函数怎么写。

编辑： 此管道在检测到上游终止时将 IORef 设置为 True：

detectUpstreamClose :: IORef Bool ->  Conduit i IO i
detectUpstreamClose ref = conduit
  where
    conduit = do
        m <- await
        case m of
          Nothing -> liftIO (writeIORef ref True)
          Just i -> do
              yield i
              conduit

detectUpstreamClose 可以插入到管道中，IORef 可以在之后检查。

Answer 2

绝对应该有一个不错的解决方案，但我无法使用 ConduitM 原语构建它。有签名的东西

ConduitM i a m r1 -> ConduitM a o m r2 -> ConduitM i o m (Maybe r1, r2)

看起来带有此签名的原始函数将是 conduit 库的一个很好的补充。

然而，@danidiaz 关于 StateT 的建议引导我采用以下通用解决方案，该解决方案在内部将整个计算提升到 WriterT，以便记住第一个管道的输出（如果达到）：

import Control.Monad
import Control.Monad.Trans
import Control.Monad.Trans.Writer
import Data.Conduit
import Data.Monoid
import Data.Void

difficultFunction :: (Monad m)
                  => ConduitM () a2 m r1 -> ConduitM a2 Void m r2
                  -> m (r2, Maybe r1)
difficultFunction l r = liftM (fmap getLast) $ runWriterT (l' $$ r')
  where
    l' = transPipe lift l >>= lift . tell . Last . Just
    r' = transPipe lift r

（未经测试！）

Haskell 管道：是否可以选择获得源的结果？

Haskell Conduit: is it possible to optionally have the result of a source?

haskell

conduit