GHC 如何强制多线程应用程序中的评估?

How GHC forces evaluation in multithreaded applications?

例如:我有一个非常简单的斐波那契数列记忆实现, 我在多个线程中请求:

{-# LANGUAGE BangPatterns #-}
module Main where

import Control.Concurrent
import Control.DeepSeq
import System.Environment (getArgs)
import System.IO.Unsafe (unsafePerformIO)

data Tree a = Tree (Tree a) a (Tree a)
instance Functor Tree where
  fmap f (Tree l m r) = Tree (fmap f l) (f m) (fmap f r)

index :: Tree a -> Int -> a
index (Tree _ m _) 0 = m
index (Tree l _ r) n = case (n - 1) `divMod` 2 of
  (q,0) -> index l q
  (q,_) -> index r q

nats :: Tree Int
nats = go 0 1
  where go !n !s = Tree (go l s') n (go r s')
          where l = n + s
                r = l + s
                s' = s * 2

fib :: (Int -> Integer) -> Int -> Integer
fib _ 0 = 0
fib _ 1 = 1
fib f n = f (n - 1) + f (n - 2)

fib_tree :: Tree Integer
fib_tree = fmap (fib fastfib) nats

fastfib :: Int -> Integer
fastfib = index fib_tree

writeMutex :: MVar ()
writeMutex = unsafePerformIO (newMVar ())

fibIO :: Int -> IO ()
fibIO n = let fibn = fastfib n
          in deepseq fibn $ do takeMVar writeMutex
                               putStrLn (show n ++ " " ++ show fibn)
                               putMVar writeMutex ()

children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])

waitForChildren :: IO ()
waitForChildren = do
 cs <- takeMVar children
 case cs of
   []   -> return ()
   m:ms -> do
      putMVar children ms
      takeMVar m
      waitForChildren

forkChild :: IO () -> IO ThreadId
forkChild io = do
   mvar <- newEmptyMVar
   childs <- takeMVar children
   putMVar children (mvar:childs)
   forkFinally io (\_ -> putMVar mvar ())

main' :: [Int] -> IO ()
main' = mapM_ (forkChild . fibIO)

main :: IO ()
main = do
  nargs <- fmap read `fmap` getArgs :: IO [Int]
  main' nargs
  waitForChildren

当使用 -threaded 编译时,我可以 运行 它:

% time ./concur 10 10 10 10 10 10 10 +RTS -N4
10 55
10 55
10 55
10 55
10 55
10 55
10 55
./concur 10 10 10 10 10 10 10 +RTS -N4  0.00s user 0.00s system 82% cpu 0.007 total

正如预期的那样,如果我有一个或多个大参数,执行时间是相同的。

我对记忆树中的 thunk 评估是如何在低级别执行的很感兴趣?

当一个线程评估一个 thunk 时,该块被锁定,其他线程阻塞在它上面(又名黑洞)。有关详细信息,请参阅 Haskell on a Shared-Memory Multiprocessor 论文。