Haskell: 当不需要日志时,让 Writer 和普通代码一样高效

Haskell: make a Writer as efficient as normal code when log is not needed

我想写一个代码,可以 运行 分成两个 "modes":

我试着写了下面的代码,它创建了两个 Writer,一个普通的(用于日志记录模式)和一个愚蠢的(不记录任何东西,用于高效模式)。然后我定义了一个新的 class LogFunctionCalls 允许我在这两个 Writers 之一中 运行 我的函数。

但是,我尝试比较使用 Stupid writer 的代码速度,它比没有 writer 的普通代码慢得多:这里是分析信息:

这是代码(您可以根据要使用的 运行 进行评论):

{-# LANGUAGE ScopedTypeVariables #-}
module Main where

--- It depends on the transformers, containers, and base packages.

--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.

import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge

import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time

-- Test using writer monad

-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
  deriving (Eq, Show)

-- Create the Monoid instances
instance Semigroup LogEntry where
  (LogEntry m1) <> (LogEntry m2) =
    LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
  mempty = LogEntry MapStrict.empty

instance Semigroup StupidLogEntry where
  (StupidLogEntry m1) <> (StupidLogEntry m2) =
    StupidLogEntry $ m1
instance Monoid StupidLogEntry where
  mempty = StupidLogEntry MapStrict.empty

-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
  myTell :: String -> Int -> Writer r ()

instance LogFunctionCalls LogEntry where
  myTell namefunction n = do
    Wr.tell $ LogEntry $ MapStrict.singleton namefunction n

instance LogFunctionCalls StupidLogEntry where
  myTell namefunction n = do
    -- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
    return ()

-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
  myTell "countNumberCalls" 1
  x <- countNumberCalls $ n - 1
  return $ 1 + x

--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)

main :: IO ()
main = do
  putStrLn $ "Hello"
  -- Version without any writter
  print =<< Time.getZonedTime
  let n = countNumberCallsNoWriter 15000000
  putStrLn $ "Without any writer, the result is " ++ (show n)
  -- Version with Logger
  print =<< Time.getZonedTime
  let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
  -- Version with the stupid logger
  print =<< Time.getZonedTime
  let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
  putStrLn $ "The result is " ++ (show n)
  putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
  print =<< Time.getZonedTime  

Writer monad 是瓶颈。一个更好的方法来概括你的代码,以便它可以 运行 在这两个 "modes" 中是改变接口,LogFunctionCalls class, 由 monad 参数化:

class Monad m => LogFunctionCalls m where
  myTell :: String -> Int -> m ()

然后我们可以使用身份 monad(或 monad 转换器)简单地实现它:

newtype NoLog a = NoLog a
  deriving (Functor, Applicative, Monad) via Identity

instance LogFunctionCalls NoLog where
  myTell _ _ = pure ()

另请注意,要测试的函数现在具有不同的类型,不再明确引用 Writer

countNumberCalls :: (LogFunctionCalls m) => Int -> m Int

让我们把它放在一个基准测试中,它有评论中指出的各种方法论问题,但是,如果我们用 ghc -O:

编译它,仍然会发生一些有趣的事情
main :: IO ()
main = do
  let iternumber = 1500000
  putStrLn $ "Hello"
  t0 <- Time.getCurrentTime

  -- Non-monadic version
  let n = countNumberCallsNoWriter iternumber
  putStrLn $ "Without any writer, the result is " ++ (show n)
  t1 <- Time.getCurrentTime
  print (Time.diffUTCTime t1 t0)

  -- NoLog version
  let n = unNoLog $ countNumberCalls iternumber
  putStrLn $ "The result is " ++ (show n)
  t2 <- Time.getCurrentTime
  print (Time.diffUTCTime t2 t1)

输出:

Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s

正如我们所见,第二个版本(我们关心的版本)花费了零时间。如果我们从基准测试中删除第一个版本,那么剩下的版本将占用前者的 0.022s。

所以 GHC 实际上优化了两个基准测试中的一个,因为它看到它们是相同的,这实现了我们最初想要的:"logging" 代码 运行s 与专用代码一样快,没有日志记录,因为它们实际上是相同的,基准数字无关紧要。

这也可以通过查看生成的核心来确认; 运行 ghc -O -ddump-simpl -ddump-to-file -dsuppres-all 并理解文件 Main.dump-simpl。或者使用 inspection-testing.

可编译要点:https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8