Haskell: 当不需要日志时,让 Writer 和普通代码一样高效
Haskell: make a Writer as efficient as normal code when log is not needed
我想写一个代码,可以 运行 分成两个 "modes":
- 要么在日志记录模式下,即它应该记录一些信息(在我的例子中,我想记录在给定时间对某些特定函数完成的调用次数)
- 或在高效模式下,即它不记录任何内容,只是尽可能快地记录 运行s
我试着写了下面的代码,它创建了两个 Writer,一个普通的(用于日志记录模式)和一个愚蠢的(不记录任何东西,用于高效模式)。然后我定义了一个新的 class LogFunctionCalls
允许我在这两个 Writers 之一中 运行 我的函数。
但是,我尝试比较使用 Stupid writer 的代码速度,它比没有 writer 的普通代码慢得多:这里是分析信息:
- 没有编写器的代码:总时间 = 0.27 秒,总分配 = 55,800 字节
- 代码与愚蠢的作者
StupidLogEntry
:总时间= 0.74 秒,总分配= 600,060,408 字节(注意:实际时间远大于0.74 秒...)
- 真实作者的代码
LogEntry
:总时间= 5.03秒,总分配= 1,920,060,624字节
这是代码(您可以根据要使用的 运行 进行评论):
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
--- It depends on the transformers, containers, and base packages.
--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.
import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge
import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time
-- Test using writer monad
-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- Create the Monoid instances
instance Semigroup LogEntry where
(LogEntry m1) <> (LogEntry m2) =
LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
mempty = LogEntry MapStrict.empty
instance Semigroup StupidLogEntry where
(StupidLogEntry m1) <> (StupidLogEntry m2) =
StupidLogEntry $ m1
instance Monoid StupidLogEntry where
mempty = StupidLogEntry MapStrict.empty
-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
myTell :: String -> Int -> Writer r ()
instance LogFunctionCalls LogEntry where
myTell namefunction n = do
Wr.tell $ LogEntry $ MapStrict.singleton namefunction n
instance LogFunctionCalls StupidLogEntry where
myTell namefunction n = do
-- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
return ()
-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
myTell "countNumberCalls" 1
x <- countNumberCalls $ n - 1
return $ 1 + x
--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)
main :: IO ()
main = do
putStrLn $ "Hello"
-- Version without any writter
print =<< Time.getZonedTime
let n = countNumberCallsNoWriter 15000000
putStrLn $ "Without any writer, the result is " ++ (show n)
-- Version with Logger
print =<< Time.getZonedTime
let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
-- Version with the stupid logger
print =<< Time.getZonedTime
let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
print =<< Time.getZonedTime
Writer
monad 是瓶颈。一个更好的方法来概括你的代码,以便它可以 运行 在这两个 "modes" 中是改变接口, 即 , LogFunctionCalls
class, 由 monad 参数化:
class Monad m => LogFunctionCalls m where
myTell :: String -> Int -> m ()
然后我们可以使用身份 monad(或 monad 转换器)简单地实现它:
newtype NoLog a = NoLog a
deriving (Functor, Applicative, Monad) via Identity
instance LogFunctionCalls NoLog where
myTell _ _ = pure ()
另请注意,要测试的函数现在具有不同的类型,不再明确引用 Writer
:
countNumberCalls :: (LogFunctionCalls m) => Int -> m Int
让我们把它放在一个基准测试中,它有评论中指出的各种方法论问题,但是,如果我们用 ghc -O
:
编译它,仍然会发生一些有趣的事情
main :: IO ()
main = do
let iternumber = 1500000
putStrLn $ "Hello"
t0 <- Time.getCurrentTime
-- Non-monadic version
let n = countNumberCallsNoWriter iternumber
putStrLn $ "Without any writer, the result is " ++ (show n)
t1 <- Time.getCurrentTime
print (Time.diffUTCTime t1 t0)
-- NoLog version
let n = unNoLog $ countNumberCalls iternumber
putStrLn $ "The result is " ++ (show n)
t2 <- Time.getCurrentTime
print (Time.diffUTCTime t2 t1)
输出:
Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s
正如我们所见,第二个版本(我们关心的版本)花费了零时间。如果我们从基准测试中删除第一个版本,那么剩下的版本将占用前者的 0.022s。
所以 GHC 实际上优化了两个基准测试中的一个,因为它看到它们是相同的,这实现了我们最初想要的:"logging" 代码 运行s 与专用代码一样快,没有日志记录,因为它们实际上是相同的,基准数字无关紧要。
这也可以通过查看生成的核心来确认; 运行 ghc -O -ddump-simpl -ddump-to-file -dsuppres-all
并理解文件 Main.dump-simpl
。或者使用 inspection-testing.
可编译要点:https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8
我想写一个代码,可以 运行 分成两个 "modes":
- 要么在日志记录模式下,即它应该记录一些信息(在我的例子中,我想记录在给定时间对某些特定函数完成的调用次数)
- 或在高效模式下,即它不记录任何内容,只是尽可能快地记录 运行s
我试着写了下面的代码,它创建了两个 Writer,一个普通的(用于日志记录模式)和一个愚蠢的(不记录任何东西,用于高效模式)。然后我定义了一个新的 class LogFunctionCalls
允许我在这两个 Writers 之一中 运行 我的函数。
但是,我尝试比较使用 Stupid writer 的代码速度,它比没有 writer 的普通代码慢得多:这里是分析信息:
- 没有编写器的代码:总时间 = 0.27 秒,总分配 = 55,800 字节
- 代码与愚蠢的作者
StupidLogEntry
:总时间= 0.74 秒,总分配= 600,060,408 字节(注意:实际时间远大于0.74 秒...) - 真实作者的代码
LogEntry
:总时间= 5.03秒,总分配= 1,920,060,624字节
这是代码(您可以根据要使用的 运行 进行评论):
{-# LANGUAGE ScopedTypeVariables #-}
module Main where
--- It depends on the transformers, containers, and base packages.
--- You can profile it with:
--- $ cabal v2-run --enable-profiling debug -- +RTS -p
--- and a file debug.prof will be created.
import qualified Data.Map.Strict as MapStrict
import qualified Data.Map.Merge.Strict as MapMerge
import qualified Control.Monad as CM
import Control.Monad.Trans.Writer.Strict (Writer)
import qualified Control.Monad.Trans.Writer.Strict as Wr
import qualified Data.Time as Time
-- Test using writer monad
-- The actual LogEntry, that should associate a number
-- to each name
newtype LogEntry = LogEntry { logMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- A logentry that does not record anything, always empty
newtype StupidLogEntry = StupidLogEntry { stupidLogMap:: MapStrict.Map String Int }
deriving (Eq, Show)
-- Create the Monoid instances
instance Semigroup LogEntry where
(LogEntry m1) <> (LogEntry m2) =
LogEntry $ MapStrict.unionWith (+) m1 m2
instance Monoid LogEntry where
mempty = LogEntry MapStrict.empty
instance Semigroup StupidLogEntry where
(StupidLogEntry m1) <> (StupidLogEntry m2) =
StupidLogEntry $ m1
instance Monoid StupidLogEntry where
mempty = StupidLogEntry MapStrict.empty
-- Create a class that allows me to use the function "myTell"
-- that adds a number in the writer (either the LogEntry
-- or StupidLogEntry one)
class (Monoid r) => LogFunctionCalls r where
myTell :: String -> Int -> Writer r ()
instance LogFunctionCalls LogEntry where
myTell namefunction n = do
Wr.tell $ LogEntry $ MapStrict.singleton namefunction n
instance LogFunctionCalls StupidLogEntry where
myTell namefunction n = do
-- Wr.tell $ StupidLogEntry $ Map.singleton namefunction n
return ()
-- Function in itself, with writers
countNumberCalls :: (LogFunctionCalls r) => Int -> Writer r Int
countNumberCalls 0 = return 0
countNumberCalls n = do
myTell "countNumberCalls" 1
x <- countNumberCalls $ n - 1
return $ 1 + x
--- Without any writer, pretty efficient
countNumberCallsNoWriter :: Int -> Int
countNumberCallsNoWriter 0 = 0
countNumberCallsNoWriter n = 1 + countNumberCallsNoWriter (n-1)
main :: IO ()
main = do
putStrLn $ "Hello"
-- Version without any writter
print =<< Time.getZonedTime
let n = countNumberCallsNoWriter 15000000
putStrLn $ "Without any writer, the result is " ++ (show n)
-- Version with Logger
print =<< Time.getZonedTime
let (n, log :: LogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the logger, the number of calls is " ++ (show $ (logMap log))
-- Version with the stupid logger
print =<< Time.getZonedTime
let (n, log :: StupidLogEntry) = Wr.runWriter $ countNumberCalls 15000000
putStrLn $ "The result is " ++ (show n)
putStrLn $ "With the stupid logger, the number of calls is " ++ (show $ (stupidLogMap log))
print =<< Time.getZonedTime
Writer
monad 是瓶颈。一个更好的方法来概括你的代码,以便它可以 运行 在这两个 "modes" 中是改变接口, 即 , LogFunctionCalls
class, 由 monad 参数化:
class Monad m => LogFunctionCalls m where
myTell :: String -> Int -> m ()
然后我们可以使用身份 monad(或 monad 转换器)简单地实现它:
newtype NoLog a = NoLog a
deriving (Functor, Applicative, Monad) via Identity
instance LogFunctionCalls NoLog where
myTell _ _ = pure ()
另请注意,要测试的函数现在具有不同的类型,不再明确引用 Writer
:
countNumberCalls :: (LogFunctionCalls m) => Int -> m Int
让我们把它放在一个基准测试中,它有评论中指出的各种方法论问题,但是,如果我们用 ghc -O
:
main :: IO ()
main = do
let iternumber = 1500000
putStrLn $ "Hello"
t0 <- Time.getCurrentTime
-- Non-monadic version
let n = countNumberCallsNoWriter iternumber
putStrLn $ "Without any writer, the result is " ++ (show n)
t1 <- Time.getCurrentTime
print (Time.diffUTCTime t1 t0)
-- NoLog version
let n = unNoLog $ countNumberCalls iternumber
putStrLn $ "The result is " ++ (show n)
t2 <- Time.getCurrentTime
print (Time.diffUTCTime t2 t1)
输出:
Hello
Without any writer, the result is 1500000
0.022030957s
The result is 1500000
0.000081533s
正如我们所见,第二个版本(我们关心的版本)花费了零时间。如果我们从基准测试中删除第一个版本,那么剩下的版本将占用前者的 0.022s。
所以 GHC 实际上优化了两个基准测试中的一个,因为它看到它们是相同的,这实现了我们最初想要的:"logging" 代码 运行s 与专用代码一样快,没有日志记录,因为它们实际上是相同的,基准数字无关紧要。
这也可以通过查看生成的核心来确认; 运行 ghc -O -ddump-simpl -ddump-to-file -dsuppres-all
并理解文件 Main.dump-simpl
。或者使用 inspection-testing.
可编译要点:https://gist.github.com/Lysxia/2f98c4a8a61034dcc614de5e95d7d5f8