使用 Haskell 类型族或 GADT 的模块化算法？

Question

我经常有机会在 Haskell 中执行 mod 元运算，其中 modulus 通常很大并且通常是质数（例如 2000000011）。目前，我只使用像 (modAdd m a b)、(modMul m a b)、(modDiv m a b) 等函数。但这很不方便，需要一个额外的参数来始终指定并携带并以常规积分形式和 mod- 形式分别创建我的各种函数。

所以创建一个新的 class 像这样的东西可能是个好主意：

class Integral a => Mod a

m = 2000000011

instance Integral a => Num (Mod a) where
   ... defining (+), etc. as modulo m

然后可以使用常规函数执行常规算术，并定义有用的结构，例如

factorials :: [Mod Int]
factorials = 1:zipWith (*) factorials [1..]

但这有一个问题：Mod Int 类型的所有值必须具有相同的 modulus。但是，我经常需要在一个程序中使用多个 moduli（当然总是只组合相同 modulus 的值）。

我认为，但理解得不够好，无法确定，这可以通过这样的事情来克服：

class Integral a => Mod Nat a

其中 Nat 是一种以 Peano 方式对 modulus 进行编码的类型。这将是有利的：我可以有不同的 moduli 值，并且类型检查器可以避免我不小心组合这个值。

这样的事情可行且高效吗？如果我尝试使用那个 modulus，它会导致编译器或 RTS 尝试实际构造巨大的 (Succ (Succ (Succ ... repeated 2000000011 times)，使解决方案有效无用吗？ RTS 会尝试检查每个操作的类型匹配吗？每个值的 RTS 表示是否会被炸毁，否则可能只是一个未装箱的 int？

有没有更好的方法？

结论

感谢 cirdec, dfeuer, user5402, and tikhon-jelvis, I learned that (unsurprisingly) I was not the first to have this idea. In particular, there is a recent paper by Kiselyov and Shan that gives an implementation and tikhon-jelvis has posted to Hackage a solution called (surprise!) modular-arithmetic 的有用评论，它使用精美的 ghc pragmas 提供了更好的语义。

打开问题（给我）

幕后发生了什么？特别是，[Mod Int 2000000011] 的百万元素列表会携带 2000000011 的额外百万份吗？或者它是否编译成与单独携带 modulus 参数的百万 Int 列表相同的代码？后者就好了。

性能附录

我运行我正在处理的当前问题的一些基准。第一个运行使用了一个未装箱的 10,000 元素 Int 向量并对其执行了 10,000 次操作：

   4,810,589,520 bytes allocated in the heap
         107,496 bytes copied during GC
       1,197,320 bytes maximum residency (1454 sample(s))
         734,960 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6905 colls,     0 par    0.109s   0.101s     0.0000s    0.0006s
  Gen  1      1454 colls,     0 par    0.812s   0.914s     0.0006s    0.0019s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    2.672s  (  2.597s elapsed)
  GC      time    0.922s  (  1.015s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    3.594s  (  3.614s elapsed)

  Alloc rate    1,800,454,557 bytes per MUT second

  Productivity  74.3% of total user, 73.9% of total elapsed

对于第二个运行，我对未装箱的 10,000 Vector (Mod Int 1000000007) 执行了相同的操作。这使我的代码更简单一些，但花费了大约 3 倍的时间（同时具有几乎相同的内存配置文件）：

   4,810,911,824 bytes allocated in the heap
         107,128 bytes copied during GC
       1,199,408 bytes maximum residency (1453 sample(s))
         736,928 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6906 colls,     0 par    0.094s   0.107s     0.0000s    0.0007s
  Gen  1      1453 colls,     0 par    1.516s   1.750s     0.0012s    0.0035s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    8.562s  (  8.323s elapsed)
  GC      time    1.609s  (  1.857s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   10.172s  ( 10.183s elapsed)

  Alloc rate    561,858,315 bytes per MUT second

  Productivity  84.2% of total user, 84.1% of total elapsed

我想知道为什么会这样，是否可以修复。不过，我真的很喜欢 modular-arithmetic 包，并且会在性能不是绝对关键的地方使用它。

Answer 1

这是一些使用 Data.Reflection 的工作代码：

{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE FlexibleContexts #-}

import Data.Reflection
import Data.Proxy

data M a s = M a -- Note the phantom comes *after* the concrete

-- In `normalize` we're tying the knot to get the phantom types to align
-- note that reflect :: Reifies s a => forall proxy. proxy s -> a

normalize :: (Reifies s a, Integral a) => a -> M a s
normalize a = b where b = M (mod a (reflect b)) 

instance (Reifies s a, Integral a) => Num (M a s) where
  M a + M b = normalize (a + b)
  M a - M b = normalize (a - b)
  M a * M b = normalize (a * b)
  fromInteger n = normalize (fromInteger n)
  abs _     = error "abs not implemented"
  signum _  = error "sgn not implemented"

withModulus :: Integral a => a -> (forall s. Reifies s a => M a s) -> a
withModulus m ma = reify m (runM . asProxyOf ma)
  where asProxyOf :: f s -> Proxy s -> f s
        asProxyOf a _ = a

runM :: M a s -> a
runM (M a) = a

example :: (Reifies s a, Integral a) => M a s
example = normalize 3

example2 :: (Reifies s a, Integral a, Num (M a s)) => M a s
example2 = 3*3 + 5*5

mfactorial :: (Reifies s a, Integral a, Num (M a s)) => Int -> M a s
mfactorial n = product $ map fromIntegral [1..n]

test1 p n = withModulus p $ mfactorial n

madd :: (Reifies s Int, Num (M Int s)) => M Int s -> M Int s -> M Int s
madd a b = a + b

test2 :: Int -> Int -> Int -> Int
test2 p a b = withModulus p $ madd (fromIntegral a) (fromIntegral b)

Answer 2

较新版本的 GHC 内置了类型级数字，这应该比您使用 Peano 算术自己滚动的数字更有效。您可以通过启用 DataKinds 来使用它们。作为奖励，您还将获得一些不错的语法：

factorials :: [Mod Int 20]

这是否有效取决于您如何实现 Mod 类型。在最一般的情况下，您可能只想在每次算术运算后 mod 。除非你处于一个热循环中，保存一些指令很重要，否则这应该没问题。（在热循环中，无论如何，最好明确说明何时 mod。）

我实际上在 Hackage 的一个库中实现了这种类型：modular-arithmetic。它有一个测试套件，但没有基准测试，所以我不能保证绝对的性能，但它没有做任何应该慢的事情，而且它对我的目的来说足够快了。（无可否认，这涉及到小的 moduli。）如果您尝试它并且运行遇到性能问题，我很想听听它们，以便我可以尝试修复它们。

使用 Haskell 类型族或 GADT 的模块化算法？

Modular Arithmetic using Haskell Type-Families or GADTs?

haskell

modular

ghc

type-families

gadt

结论

打开问题（给我）

性能附录