Haskell 中的 Sundaram 高效 (O(n^2)) 筛

Question

关于如何在 Haskell 中实施 Sundaram 筛法的 SO 上有很多答案，但它们都...真的很低效吗？

我见过的所有解决方案都是这样工作的：

找出要排除的数字 <= n
从 [1..n]
修改剩余号码* 2 + 1

例如，这是我的实现，它找到 1 和 2n+2 之间的所有素数：

sieveSundaram :: Integer -> [Integer]
sieveSundaram n = map (\x -> 2 * x + 1) $ filter (flip notElem toRemove) [1..n]
  where toRemove = [i + j + 2*i*j | i <- [1..n], j <- [i..n], i + j + 2*i*j <= n]

我遇到的问题是 filter 必须为 [1..n] 的每个元素遍历整个 toRemove 列表，因此复杂度为 O(n^3)而直接迭代实现的复杂度为 O(n^2)。我怎样才能在 Haskell 中做到这一点？

Answer 1

根据评论，base 不应被视为 Haskell 的完整标准库。每个 Haskell 开发人员都知道并使用了几个关键包，并且会考虑将其作为 Haskell 的 事实上 标准库的一部分。

通过“直接迭代实现”，我假设您的意思是标记和清除标志数组？通常为此使用 Vector 或 Array。（两者都将被视为“标准”。）O(n^2) Vector 解决方案如下所示。尽管它在内部使用可变向量，但批量更新运算符 (//) 隐藏了这一事实，因此您可以将其编写为典型的 Haskell 不可变和无状态样式：

import qualified Data.Vector as V

primesV :: Int -> [Int]
primesV n = V.toList                           -- the primes!
  . V.map (\x -> (x+1)*2+1)                    -- apply transformation
  . V.findIndices id                           -- get remaining indices
  . (V.// [(k - 1, False) | k <- removals n])  -- scratch removals
  $ V.replicate n True                         -- everyone's allowed

removals n = [i + j + 2*i*j | i <- [1..n], j <- [i..n], i + j + 2*i*j <= n]

另一种更直接的可能性是 IntSet，它基本上是一组具有 O(1) insertion/deletion 和 O(n) 有序遍历的整数。（这就像评论中建议的 HashSet，但专门用于整数。）这是 containers 包中的另一个“标准”包，它实际上与 GHC 源代码捆绑在一起，尽管它不同于base。它给出了一个 O(n^2) 的解决方案，如下所示：

import qualified Data.IntSet as I

primesI :: Int -> [Int]
primesI n = I.toAscList               -- the primes!
  . I.map (\x -> x*2+1)               -- apply transformation
  $ I.fromList [1..n]                 -- integers 1..n ...
    I.\ I.fromList (removals n)      -- ... except removals

请注意，另一个重要的性能改进是使用更好的 removals 定义，避免过滤所有 n^2 组合。我相信以下定义会产生相同的删除列表：

removals :: Int -> [Int]
removals n = [i + j + 2*i*j | j <- [1..(n-1) `div` 3], i <- [1..(n-j) `div` (1+2*j)]]

我认为这样做的时间复杂度为 O(n log(n))。如果你将它与上面的 primesV 或 primesI 一起使用，它就是瓶颈，所以我认为最终的整体算法应该是 O(n log(n))。

Answer 2

这个问题没有定义低效是什么意思。 OP 似乎挂断了使用 Haskell Lazy List 解决方案，这从一开始就是低效的，因为列表上的操作是顺序的，并且在需要为包含许多“的每个元素分配内存时具有很高的恒定开销”管道”部分在内部实现可能的懒惰。

正如我在评论中提到的，Sundaram 筛法的原始定义是模糊的，并且由于过度订阅表示奇数的范围而包含许多冗余操作；它可以大大简化，如那里所述。

然而，即使在最小化 SoS 的低效率之后并且如果使用 List 是人们想要的方式：正如 OP 所指出的那样，简单地重复过滤列表并不有效，因为每个 List 元素将有许多重复操作根据以下修改后的 OP 代码：

sieveSundaram :: Int -> [Int]
sieveSundaram n = map (\x -> 2 * x + 3) $ filter (flip notElem toRemove) [ 0 .. lmt ]
  where lmt = (n - 3) `div` 2
        sqrtlmt = (floor(sqrt(fromIntegral n)) - 3) `div` 2
        mkstrtibp i = ((i + i) * (i + 3) + 3, i + i + 3)
        toRemove = concat [ let (si, bp) = mkstrtibp i in [ si, si + bp .. lmt ]
                            | i <- [ 0 .. sqrtlmt ] ]

main :: IO ()
main = print $ sieveSundaram 1000

由于改进后的串联 toRemove 列表中有 O(n log n) 个值，并且必须对所有这些值进行扫描以查找所有奇数值以达到筛分极限，因此其渐近复杂度为 O(n^2 log n)，它回答了问题，但不是很好。

最快的列表素数过滤技术是延迟合并生成的复合剔除列表树（而不是仅仅连接它），然后生成不在合并复合中的所有奇数的输出列表（递增顺序以避免每次都扫描整个列表）。使用线性合并不是那么有效，但是我们可以使用无限的树状合并，它只会花费 log n 的额外因子，当乘以 O(n log n) 的复杂度时正确的 Sundaram 筛分法的剔除值给出了 O(n log^2 n) 的组合复杂度，这比以前的实现要小得多。

这个合并是有效的，因为每个连续的复合剔除List都是从最后一个奇数的平方增加2开始的，所以剔除序列List的每个基值在整个List的List中的第一个值已经在增加命令;因此，List of List 的简单合并排序不会竞争并且很容易实现：

primesSoS :: () -> [Int]   
primesSoS() = 2 : sel 3 (_U $ map(\n -> [n * n, n * n + n + n..]) [ 3, 5.. ]) where
  sel k s@(c:cs) | k < c     = k : sel (k+2) s  -- ~= ([k, k + 2..] \ s)
                 | otherwise =     sel (k+2) cs --      when null(s\[k, k + 2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xs') ys@(y:ys') | x < y     = x : merge xs' ys
                              | y < x     = y : merge xs  ys'
                              | otherwise = x : merge xs' ys'

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ takeWhile (<= cLIMIT) $ primesSoS()

当然，一定要问“为什么要用孙达拉姆筛子？”这个问题。当删除原始 SoS 公式的残骸时 (see the Wikipedia article)，很明显，SoS 和 Eratosthenes 的 Odds-Only Sieve 之间的唯一区别是 SoS 不过滤基本剔除奇数仅针对那些像 Odds-Only SoE 那样的主要对象。以下代码仅对找到的基本素数进行递归反馈：

primesSoE :: () -> [Int]   
primesSoE() = 2 : _Y ((3:) . sel 5 . _U . map (\n -> [n * n, n * n + n + n..])) where
  _Y g = g (_Y g)  -- = g (g (g ( ... )))   non-sharing multistage fixpoint combinator
  sel k s@(c:cs) | k < c     = k : sel (k+2) s  -- ~= ([k, k + 2..] \ s)
                 | otherwise =     sel (k+2) cs --      when null(s\[k, k + 2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xs') ys@(y:ys') | x < y     = x : merge xs' ys
                              | y < x     = y : merge xs  ys'
                              | otherwise = x : merge xs' ys'

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ takeWhile (<= cLIMIT) $ primesSoE()

固定点_Y组合器负责递归，其余部分相同。此版本将复杂度降低了一个 log n 因子，因此现在渐近复杂度为 O(n log n log log n).

如果真的想要效率，那么不要使用 List，而是使用可变数组。以下代码使用内置位压缩数组将 SoS 实现到固定范围：

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.ST ( runST )
import Data.Array.Base ( newArray, unsafeWrite, unsafeFreezeSTUArray, assocs )

primesSoSTo :: Int -> [Int] -- generate a list of primes to given limit...
primesSoSTo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- indexed true is composite
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | i <- [ 0.. ] ] -- all odds!
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ primesSoSTo cLIMIT

具有 O(n log n) 的渐近复杂度，以下代码对 Odds-Only SoE 执行相同的操作：

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.ST ( runST )
import Data.Array.Base ( newArray, unsafeWrite, unsafeFreezeSTUArray, assocs )

primesSoETo :: Int -> [Int] -- generate a list of primes to given limit...
primesSoETo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- when indexed is true is composite
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | (i, False) <- assocs oddcmpstsf ]
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ primesSoETo cLIMIT

渐近效率为 O(n log log n)。

由于减少了可变数组操作而不是列表操作中的常数因子执行时间，以及减少了 log n 因子渐近复杂度。

Haskell 中的 Sundaram 高效 (O(n^2)) 筛

Efficient (O(n^2)) Sieve of Sundaram in Haskell

optimization

primes

haskell

sieve

time-complexity